
Calhoun 

iniQiuiic^iul Ar{hiv« of tilt Mil vdl Poii^roduiit School 


Calhoun: The NPS Institutional Archive 
□Space Repository 



Theses and Dissertations 


1. Thesis and Dissertation Collection, all items 


2017-09 

Test and evaluation of an image-matching 
navigation system for a UAS operating in a 
GPS-denied environment 

Han, Keng Slew Aloysius 

Monterey, California: Naval Postgraduate School 
http://hdl.handle.net/10945/56131 
Copyright is reserved by the copyright owner. 

Downloaded from NPS Archive: Calhoun 



DUDLEY 

KNOX 

LIBRARY 


htt p://w ww. n ps. e du/l ib ra ry 


Caflwuo is the Naval Postgraduate School's public access digital repository for 
research mate rials and institutiional publicatkins created by the NPS community. 
Calhoun is named for Professor of Mathematics Guy K. Caftiouo, NPS's first 
appointed — and published — schoteily author. 

Dudley Knox Library / Naval Postgraduate School 
411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 







NAVAL 

POSTGRADUATE 

SCHOOL 

MONTEREY, CALIFORNIA 

THESIS 


TEST AND EVALUATION OF AN IMAGE-MATCHING 
NAVIGATION SYSTEM FOR A UAS OPERATING IN A 
GPS-DENIED ENVIRONMENT 

by 

Keng Siew Aloysius Han 
September 2017 

Thesis Advisor Oleg Yakimenko 

Co-Advisor Ryan Deeker 


Approved for public release. Distribution is unlimited. 




THIS PAGE INTENTIONALLY LEET BLANK 



REPORT DOCUMENTATION PAGE 

Form Approved 0MB No. 0704-0188 

Public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, 
searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments 
regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to Washington 
headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202—4302, and 
to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) Washington DC 20503. 

1. AGENCY USE ONLY (Leave Blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED 

September 2017 Master’s Thesis 09-22-2016 to 09-22-2017 

4. TITLE AND SUBTITLE 

TEST AND EVALUATION OF AN IMAGE-MATCHING NAVIGATION SYSTEM FOR 
A UAS OPERATING IN A GPS-DENIED ENVIRONMENT 

5. FUNDING NUMBERS 

6. AUTHOR(S) 

Keng Siew Aloysius Han 

7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 

Naval Postgraduate School 

Monterey, CA 93943 

8. PERFORMING ORGANIZATION REPORT 
NUMBER 

9. SPONSORING / MONITORING AGENCY NAME(S) AND ADDRESS(ES) 

Defence Science and Technology Agency 

10. SPONSORING / MONITORING 

AGENCY REPORT NUMBER 

11. SUPPLEMENTARY NOTES 

The views expressed in this document are those of the author and do not reflect the oflicial policy or position of the Department of 
Defense or the U.S. Government. IRB Protocol Number: N/A. 

12a. DISTRIBUTION / AVAILABILITY STATEMENT 

Approved for public release. Distribution is unlimited. 

12b. DISTRIBUTION CODE 

13. ABSTRACT (maximum 200 words) 

Without corrective updates from the Global Positioning System, navigational capabilities are degraded significantly when the inertial 
navigation system becomes the only source of an unmanned aerial vehicle’s movement estimate. Today, unmanned vehicles are easily 
equipped with a variety of passive sensors, such as video cameras, due to their increasingly lower prices and improvements in sensor 
resolution. The concept of using an image-matching technique on an input video camera stream was demonstrated earlier with real 
flight data using a single low-grade onboard sensor. This technique works by matching the stream of data from the camera with a 
pre-stored depository of geo-referenced reference images to estimate the current attitude and position of an unmanned aerial vehicle 
(UAV). Preliminary results indicated that unfiltered position estimates can be accurate to the order of roughly 100 meters when flying 
at two kilometers above the surface and unfiltered orientation estimates are accurate to within a few degrees. This thesis examines 
developed algorithms on a suite of video data, seeking to reduce the errors in estimating attitude and position of a UAV. The data sets 
collected at King City and Camp Roberts, California, are also studied to discover the effect of altitude, terrain pattern, elevation map, 
light conditions, age of reference data and other parameters on estimation. This thesis concludes that in the absence of other sources 
of navigational information, imagery from a camera is a viable option to provide positional information to a UAV. 

14. SUBJECT TERMS 

image-matching algorithm, GPS-denied environment, UAS, UAV 

15. NUMBER OF 

PAGES 109 

16. PRICE CODE 

17. SECURITY CLASSIFICATION 

OF REPORT 

Unclassified 

18. SECURITY CLASSIFICATION 

OF THIS PAGE 

Unclassified 

19. SECURITY CLASSIFICATION 

OF ABSTRACT 

Unclassified 

20. LIMITATION OF 
ABSTRACT 

uu 

NSN 7540-01-280-5500 Standarc 

Form 298 (Rev. 2—89) 


Prescribed by ANSI Std. 239-18 


1 




























THIS PAGE INTENTIONALLY LEET BLANK 


11 



Approved for public release. Distribution is unlimited. 


TEST AND EVALUATION OF AN IMAGE-MATCHING NAVIGATION SYSTEM 
FOR A UAS OPERATING IN A GPS-DENIED ENVIRONMENT 


Keng Siew Aloysius Han 

Civilian, Defence Science and Technology Agency 
M.A., University of Cambridge, 2010 
MPhil, University of Cambridge, 2010 
B.A., University of Cambridge, 2009 


Submitted in partial fulfillment of the 
requirements for the degree of 


MASTER OF SCIENCE IN SYSTEMS ENGINEERING 

from the 

NAVAL POSTGRADUATE SCHOOL 
September 2017 


Approved by: Oleg Yakimenko 

Thesis Advisor 


Ryan Decker, United States Army Armaments 
Graduate School at Picatinny Arsenal 
Co-Advisor 


Ronald Giachetti 

Chair, Department of Systems Engineering 



THIS PAGE INTENTIONALLY LEET BLANK 


IV 



ABSTRACT 


Without corrective updates from the Global Positioning System, navigational capabilities 
are degraded significantly when the inertial navigation system becomes the only source of 
an unmanned aerial vehicle’s movement estimate. Today, unmanned vehicles are easily 
equipped with a variety of passive sensors, such as video cameras, due to their increasingly 
lower prices and improvements in sensor resolution. The concept of using an image¬ 
matching technique on an input video camera stream was demonstrated earlier with real 
flight data using a single low-grade onboard sensor. This technique works by matching the 
stream of data from the camera with a pre-stored depository of geo-referenced reference 
images to estimate the current attitude and position of an unmanned aerial vehicle (UAV). 
Preliminary results indicated that unfiltered position estimates can be accurate to the order 
of roughly 100 meters when flying at two kilometers above the surface and unfiltered 
orientation estimates are accurate to within a few degrees. This thesis examines developed 
algorithms on a suite of video data, seeking to reduce the errors in estimating attitude and 
position of a UAV. The data sets collected at King City and Camp Roberts, California, are 
also studied to discover the effect of altitude, terrain pattern, elevation map, light conditions, 
age of reference data and other parameters on estimation. This thesis concludes that in the 
absence of other sources of navigational information, imagery from a camera is a viable 
option to provide positional information to a UAV. 


V 



THIS PAGE INTENTIONALLY LEET BLANK 


VI 



Table of Contents 


1 Introduction 1 

1.1 Background. 1 

1.2 Motivation and Problem Definition. 3 

1.3 Organization of the Thesis. 5 

2 Relevant Concepts 7 

2.1 Overview of Computer Vision Navigational Techniques. 7 

2.2 Satellite Imagery and Digital Elevation Map. 9 

2.3 Reference Conventions. 12 

2.4 Random Sample Consensus Algorithm for Outliers. 14 

2.5 Estimating System State Using Kalman Eilters. 15 

3 Image-Matching Paradigm 17 

3.1 IMMAT System Architecture. 17 

3.2 Generating the Reference Image Eibrary. 17 

3.3 In-Elight Phase. 23 

3.4 Eiltering Image-Matching Algorithm Output with a Kalman Eilter. 35 

3.5 General Observations. 36 

4 Test and Evaluation Setup and Procedures 39 

4.1 Test Equipment and Data Collection Procedures. 39 

4.2 Actual Plight Data Collection. 39 

4.3 Preliminary Steps. 42 

4.4 Measures of Performances. 45 

4.5 Meters per Pixel Resolution. 47 

5 Data Analysis 51 

5.1 Performance of Algorithm at Different Altitudes. 51 

5.2 Effect of Reference Image Pield-of-View. 54 

vii 
























5.3 Drop-Rates of the Image Matching Algorithm. 55 

5.4 Distribution of Image Matching Predictions. 56 

5.5 Analyzing Data Generated at Various Altitudes and in Different Flight Directions 63 

6 Conclusions and Future Research 67 

6.1 Summary of Work Done. 67 

6.2 Future Development. 70 

Appendix A TASE 200 Output Data 75 

Appendix B Satellite Images Meta-data 77 

Appendix C Schematic of MATLAB Program Flow 79 

List of References 83 

Initial Distribution List 85 


viii 







List of Figures 

Figure 1.1 GPS constellation. 1 

Figure 1.2 GPS triangulation. 2 

Figure 1.3 Image-matching navigation functional decomposition. 5 

Figure 2.1 A hyperbolic reflector of a catadioptric sensor capturing an omnidi¬ 
rectional view of the surroundings. 8 

Figure 2.2 Examples of catadioptric sensors. 9 

Figure 2.3 Catadioptric mathematical model. 10 

Figure 2.4 World and UAV frames of reference. 13 

Figure 3.1 Schematic of the image-matching algorithm workflow. 18 

Figure 3.2 Creating a nominal trajectory. 20 

Figure 3.3 Creating a Reference Frame. 21 

Figure 3.4 Insufficient matches. 25 

Figure 3.5 Insufficient inlier matches due to vibration. 26 

Figure 3.6 Example where the Reference Erame scene does not match with 

Camera Image. 27 

Eigure 3.7 A coarse correspondence is found between features in the Reference 

Erame and Camera Image before MSAC outlier exclusion. 27 

Eigure 3.8 A montage of reference and camera image after MSAC outlier 

culling. 28 

Eigure 3.9 Eeature extraction and outlier culling. 29 

Eigure 3.10 Projecting camera frame features (illustrated as corners for simplic¬ 

ity) onto ground in UTM coordinates from an initial estimated state 
[Easting, Northing, Up, Roll, Pitch, Yaw]. 30 


IX 





















Figure 3.11 
Figure 3.12 

Figure 3.13 
Figure 3.14 
Figure 3.15 
Figure 3.16 
Figure 3.17 
Figure 3.18 

Figure 4.1 
Figure 4.2 
Figure 4.3 
Figure 4.4 
Figure 4.5 
Figure 4.6 
Figure 4.7 

Figure 4.8 
Figure 4.9 
Figure 4.10 
Figure 4.11 
Figure 4.12 
Figure 4.13 


Viewfinder corner projection to ground. 31 

Unconstrained versus constrained optimization for estimating UAV 
position and attitude. 32 

Sample output for a typical trajectory. 33 

Distribution of control-point pairs. 34 

Cumulative histogram of control-point pairs. 34 

Rate of convergence for attitude and pose using seven control-points. 36 

Rate of convergence for attitude and pose using two control-points. 37 

Before and after culling points that were generated with insufficient 
control-point pairs between reference image and camera image. . 38 

Sample images of Camp Roberts’ terrain. 40 

Altitude profile versus camera frame number.. 40 

Full flight profile for Camp Roberts data collection. 41 

Flight site at west of King City, California. 42 

Flight profile at west of King City, California. 43 

3D flight profile at west of King City, California. 44 

Sample images of the terrain over the area between Greenfield, Cal¬ 
ifornia and King City, California. 44 

King City track segments. 45 

King City track segments in Lat-Lon view. 46 

Correcting viewing target for data collected at Camp Roberts. . . 46 

Horizontal and vertical ground resolutions. 47 

Plot of distance per pixel in camera for Camp Roberts flight. ... 48 

Plot of distance per pixel in camera for King City flight. 49 


X 



















Figure 5.1 Perspective view of the terrain. 51 

Figure 5.2 Camp Roberts error plots at various altitudes with Reference Frames 

matching at lx FOV.. 52 

Figure 5.3 Camp Roberts error plots at various altitudes with Reference Frames 

matching at 2x FOV.. 53 

Figure 5.4 Camp Roberts error plots at various altitudes with Reference Frames 

matching at 3x FOV.. 54 

Figure 5.5 King City error plots at various altitudes with Reference Frames 

matching at lx FOV.. 55 

Figure 5.6 King City error plots at various altitudes with Reference Frames 

matching at 2x FOV.. 56 

Figure 5.7 King City error plots at various altitudes with Reference Frames 

matching at 3x FOV.. 57 

Figure 5.8 Down leg track output for three different FOV sizes for Reference 

Frames. 57 

Figure 5.9 Generally low feature counts for King City trajectories. 58 

Figure 5.10 Salinas River a prominent and distinctive landform. 59 

Figure 5.11 Different field-of-views used in Reference Images generation. . . 59 

Figure 5.12 lx FOV for Reference Images generation. 60 

Figure 5.13 2x FOV for Reference Images generation. 60 

Figure 5.14 3x FOV for Reference Images generation. 61 

Figure 5.15 Sample drop rates of IMMAT algorithm for Camp Roberts flights at 

various altitudes. 62 

Figure 5.16 Typical appearance of an IMMAT output by unconstrained search. 63 

Figure 5.17 Typical appearance of an IMMAT output by constrained search. . 64 

Figure 5.18 Featureless terrain. 65 


XI 


















Figure 6.1 Simulation of an urban environment by Urban Redevelopment Au¬ 
thority of Singapore. 73 

Figure C.l Schematic for CreateSatellitelmageryAndTransforms.m. 80 

Figure C.2 Schematic for GenerateNominalTrajectory.m. 81 

Figure C.3 Schematic for GenerateReferenceFrames.m. 81 

Figure C.4 Schematic for ImageMatchingAlgorithm.m. 82 


xii 








List of Tables 

Table 1.1 Functional decomposition for the image-matching navigation task. 4 

Table 2.1 Characteristics of the ASTER digital elevation model. 11 

Table 3.1 Seven control-points estimate errors. 35 

Table 3.2 Two control-points estimate errors. 35 

Table A. 1 TASE Meta-data available for analysis. 75 

Table B. 1 Meta-data for the satellite tiles downloaded for King City. 77 


xiii 









THIS PAGE INTENTIONALLY LEET BLANK 


XIV 



List of Acronyms and Abbreviations 


AGL 

Above Ground Level 

ASTER 

Advanced Spaceborne Thermal Emission and Reflection Radiometer 

BRISK 

Binary Robust invariant scalable keypoints 

DEM 

Digital Elevation Model 

DOD 

United States Department of Defense 

EO 

Electro Optics 

GPS 

Global Positioning System 

IMU 

Inertial Measurement Unit 

INS 

Inertial Navigation System 

IR 

Infrared 

MOP 

Measure of Performance 

MSAC 

M-estimator SAmple Consensus 

MSL 

Mean Sea Eevel 

IMMAT 

Image-Matching 

RANSAC 

Random sample consensus 

RIL 

Reference Image Eibrary 

RVI 

Reference View Image 

SIFT 

Scale-Invariant Eeature Transform 

SURF 

Speeded-Up Robust Eeatures 


XV 




UAV 

UTM 


Unmanned Aerial Vehicle 
Universal Transverse Mercator 


XVI 



Executive Summary 


The U.S. Department of Defense’s Unmanned Systems Integrated Roadmap FY2011-2036 
[1] identified autonomous operations within a Global Positioning System (GPS)-denied 
environment as key area of research, and this thesis studies the use of image-matching 
techniques to provide positional information in such a situation. Navigation systems within 
unmanned vehicles today are largely reliant on updates from the GPS and, in more capable 
systems, on the inertial navigation system (INS) as well. Within a GPS-degraded or 
GPS-denied environment on Earth or other planets, navigational capabilities are degraded 
significantly because the INS becomes the only source of a vehicle’s movement estimate. 
Numerous unmanned vehicles today can and often are easily equipped with other passive 
sensors such as video cameras, as these devices have increasingly lower prices and improved 
sensor resolution. Such alternative sources of information can be used to work out the 
movement of the vehicle with respect to the operating environment. In the instance of video 
cameras, vision-based techniques can be harnessed for use as a navigation aid. Specifically, 
image-matching techniques rely on the stream of data from the cameras and a pre-stored 
depository of geo-referenced reference images to estimate the current attitude and position 
of a drone in flight. 

In a 2016 work by Yakimenko and Decker [2], the researchers tested the concept of image¬ 
matching navigation on two different platforms using a single low-grade onboard sensor. 
Their preliminary results indicated that unfiltered position estimates were accurate to the 
order of roughly 100 meters when flying at two kilometers above mean sea level while the 
unfiltered orientation estimates are accurate to within a few degrees. This thesis extends 
the work by studying the errors associated with the estimated attitude and terrain versus 
the actual recorded GPS position during data collection flights conducted at King City 
and Camp Roberts in California. Various parameters that can affect the image-matching 
navigation algorithm performance are also studied at different altitudes and in two different 
terrains. 

Five major observations from the conducted evaluations are as follows. 


1. The Image-Matching (IMMAT) approach relies on the feature-richness of both satel- 




lite and onboard camera images. To this end, a typical satellite image provides a 
resolution of 0.5m^ per pixel regardless of the size of the ground footprint. The 
resolution of on-board camera depends on the field-of-view (FOV, or zoom setting), 
altitude, and attitude. The best resolution is achieved in a level straight flight at low 
altitudes with a maximum zoom in. Nevertheless, such a setting results in a very 
narrow field of view (significant reduction in the number of features that can be used 
to match those of the satellite image). Specifically, with the TASE-200 sensor used in 
this research and a field-of-view of 35 degrees (Camp Roberts’ flights), a resolution 
of 0.5m^ per pixel can be achieved only when flying below 400m AGL. Likewise for 
King City flights, where the videos were taken at field-of-view of 10 degrees, only 
flights below 1200m can achieve 0.5m^ per pixel resolution. 

2. The texture of the Earth’s surface has a major role. Specifically, flying over the 
agricultural area consisting of crop fields (between Greenfield and King City) at low 
altitudes with a narrow field-of-view results in no features detected in the onboard 
camera field-of-view. Some features can be detected only when flying in between the 
crop fields. One way to mitigate this effect might be increasing the field-of-view, but 
that leads to a decrease in resolution and possible failure to find the matches between 
two different resolution images. Still, this approach is worth exploring in the future. 

3. Onboard camera stabilization (i.e., suppression of vibrations) has a crucial role, as 
well. In this research two aerial vehicles were used. The same sensor, a TASE-200, 
had much better stabilization when flying on UAV at 25m/s compared to that of a 
manned Cessna-206 flying twice as fast. 

4. Varying the terrain elevation also contributes to the accuracy of IMMAT navigational 
solution. That includes a requirement to have a detailed terrain elevation map of the 
intended area of operations. 

5. Aircraft attitude plays a major role, as well. In this research, IMMAT performance 
was evaluated only for straight level flight. Euture evaluation should consider IMMAT 
performance while turning, climbing and descending. 


Using a limited set of test data based on a (not high-end) TASE-200 sensor with some 
vibration isolation problems along with incorrect reporting of pan-tilt information (which 
was discovered within this research effort and reported to the manufacturer) resulted in an 
unusually high drop rate. This occurred when there were not enough matching points to 



construct a projective transformation, whieh is a basis of the IMMAT approaeh. Neverthe¬ 
less, this thesis was able to eonduet a detailed assessment of the overall performanee of the 
IMMAT algorithm. 

The main eonelusion is that when all eonditions are met (i.e., at least five matehing points 
are found), the IMMAT algorithm ean provide an estimate of an aerial vehicle’s position 
that is aeeurate to within 50m from its true position (this value eorrelates with the satellite 
image resolution), and determine the vehiele’s attitude within +15 degrees for piteh and 
roll, while finding its yaw angle within just ±2-degree aeeuracy. 

Some additional observations follow. 

• For the same field of view, as the flight profile inereases in altitude, allowing more of 
the loeal terrain to be eaptured, with a eonsequential inerease in the number of features 
and the likelihood of matches, the drop rates for the IMMAT algorithm deereases. 

• If an IMMAT drop does not oeeur, then the error assoeiated with IMMAT estimation 
appears to deerease with the altitude or pixel-per-meter on the ground. 

• This thesis relies on a simple two-dimensional projeetion of satellite imagery into 
the view of a would-be eamera in flight. The laek of elevation data introduees 
perspective differenees that may eontribute to the errors in estimation by the IMMAT 
algorithm. To quantify the errors due to projeetion further, two experiments ean be 
eondueted. First, real video imagery ean be taken at various tilt angles, with the most 
important being vertieally downward. The downward view matehes best with the top- 
down satellite view and also obviates the need for terrain elevation information for 
projeetion purposes. The seeond is to enhanee the projeetion algorithm by capturing 
a view from a three-dimensional satellite-image textured digital elevation model from 
the perspeetive of the eamera, and eomparing the estimates with the eurrent approaeh. 

• While the Referenee Image Library ean be ereated from a large eollage of high- 
resolution satellite images prior to flight and then stored onboard the UAV, it ean 
require quite a bit of spaee to store the frames. For example, a nominal trajeetory that 
requires about 700 referenee frames stored in high resolution amounted to 0.5GB; 
storing only the extraeted features and using only those will require much less spaee. 
This presents an opportunity to investigate a method for storing the Referenee Images 
Library that ean work with the IMMAT algorithm eflioiently. 


xrx 



• As the IMMAT algorithm produces an estimate frame-by-frame and only when suf¬ 
ficient matches are found, there will be variations in the estimates generated when 
they are produced; otherwise, there are no estimates. The question is whether feeding 
the output of the IMMAT algorithm into a Kalman filtering process will (1) produce 
a cleaner output, (2) produce more accurate positional predictions, and (3) use the 
previously known positional predictions as input initial positional estimate into the 
six-degrees-of-freedom optimization procedure. 

Overall, the work within this thesis enhances the users’ understanding of deploying IMMAT 
algorithms for guided unmanned activities that may follow a predetermined trajectory. With 
a predetermined trajectory, recently captured high-resolution images of the operational 
environment that the planned trajectory is expected to fly over can be pre-loaded onto the 
unmanned system. In this way, it can be used as an alternative navigational aid when other 
on-board navigational equipment fails or cannot be used. One specific example of where 
the findings of this investigation are useful is in autonomous military operations within the 
GPS denied environment that render an external accurate means of navigation unavailable 
for unmanned navigation. 


List of References 

[1] Department of Defense. (2011). The Unmanned Systems Integrated Roadmap FY2011-2036. 
[Online]. Available: https://my.nps.edu/documents/106607930/106914584/UxV+DoD+ 
Integrated-i-Roadmap+2011.pdf/0fl23fbl-eflf-4842-9855-85al36b28a93. Aeeessed 
September 13, 2017. 

[2] O. Yakimenko and R. Deeker, “On the development of an image matehing navigation algorithm 
for aerial vehieles,” Proceedings of the IEEE Aerospace Conference, Big Sky, MT, 2016. 


XX 



Acknowledgments 


I am grateful to many people for this work. First and foremost, I thank Professor Oleg 
Yakimenko for suggesting that this project would be appropriate to investigate, given that 
I would have preferred to work on something more relevant to my future job posting, and 
something more quantitative, involving some degree of coding. I thank him also for being 
a fantastic MATLAB aficionado and for having been the best lecturer for MATLAB I have 
had in the last decade. He has also helped significantly with my understanding the data sets 
that were used, as the logs from the systems used were not fully matching up with the flight 
profile due to system-setup calibration biases. 

I would also like to acknowledge Assistant Professor Ryan Decker from the United States 
Army Armaments Graduate School at Picatinny Arsenal for providing the initial code-base 
upon which the rest of my work was built, and for ideas for enhancing the algorithms 
further. I gratefully thank all my proofreaders: Michele D’Ambrosio, Ms. Barbara Berlitz, 
Charles, Anna, and my mom. Proofreading is painstaking work, requiring many hours 
to suggest reorganization and rephrasing to better present my ideas. I am also happily 
acknowledging the financial support of my sponsor, the Defense Science and Technology 
Agency of Singapore. Without their support, the opportunity to pursue this master’s program 
at the Naval Postgraduate School would not have materialized. 

I also have to thank Renee, Michele, and Simone, a welcoming family who provided me 
with a comfortable living environment during this period when I was busily trying to finish 
this thesis. I will always remember this chapter of my life —learning how to cook authentic 
Italian cuisine and savoring the most amazing raviolis. I had not tried anything as amazing 
as the truffle ricotta ravioli up to that point in my life! I would also like to mention a very 
special friend. Charcoal, although he will not be able to read this, as he is the family dog, 
but he has kept me company after school every afternoon when I thought through my thesis. 

Last but not least, it was Wee Leong and Jeremy who made this one year at NPS a more 
pleasant place; my experiences would have been vastly different without their support. 




THIS PAGE INTENTIONALLY LEET BLANK 


xxii 



CHAPTER 1: 

Introduction 


This chapter provides the baekground, eontext and the setting for the exploration of image- 
matehing algorithms for use as autonomous vehiele navigation aids. The main objeetive 
of this ehapter is to formulate the problem statement, whieh is presented together with the 
motivation for this body of work. 

1.1 Background 

Most manned and unmanned vehieles flying today rely on an integrated Inertial Navigation 
System (INS) and Global Positioning System (GPS) navigation system that uses GPS to 
provide eorreetions to vehiele position at the rate of 1 to 10 Hz [1]. The GPS uses transmitted 
information from at least four satellites out of a eonstellation of 24+ satellites (see Figure 
1.1 to eompute its loeation, see Figure 1.2 for an illustration). The GPS signal ean beeome 
unavailable due to various natural phenomena or by human aetion; when it does happen, it 
is broadly elassified as “GPS denial" [2], [3]. 


/ \ 
ula 



Figure 1.1. GPS constellation. Source: [4]. 


1 






a) with a range measurement from 
one satellite, the receiver is posi¬ 
tioned somewhere on the sphere 
defined by the satellite position ond 
the range distance, r 


b) with two satellites, the receiver is 
somewhere on a circle where the two 
spheres intersect 





Figure 1.2. GPS triangulation. Source: [5]. 


Without GPS positional updates to calibrate the INS, navigational capabilities quickly 
degrade when the system relies solely on the INS to drive dead reckoning estimates. The 
question at hand is whether there are alternative mechanisms, preferably sensors already 
available, which can provide another source of positional feeds into the navigational system. 

Numerous unmanned vehicles today can and are often easily equipped with other sensors. 
These alternative sources of information can be used to work out the movement of the 
vehicle with respect to the operating environment. One such sensor is the video camera; 
cameras are (1) getting increasingly cheaper, (2) improving in sensor resolution, and (3) 
getting smaller. As such, the use of a video camera as an alternative source of navigation 
information is the prime focus of investigation within this thesis. 

In 2016, Yakimenko and Decker [3] demonstrated that the concept of image-matching 
(IMMAT) navigation shows promise with both simulated data and real flight data captured 


2 



from a single low-grade onboard sensor. Their study reported preliminary results that 
unfiltered position estimates are accurate to roughly 100 meters (m) when flying at two 
kilometers (km) above the Earth’s surface and unfiltered orientation estimates are accurate 
to within a few degrees. Yet, further analysis is necessary to characterize the performance 
and behavior of the algorithm better. 

1.2 Motivation and Problem Definition 

This thesis seeks to further study the behavior of the proposed IMMAT concept. Under¬ 
standing the behavior of algorithms allows users of the algorithm to achieve more robust 
performance during operations. Studies to reveal the effects of altitude, terrain pattern, 
elevation map and other parameters on IMMAT navigation algorithm performance can help 
users to better understand the promises and limitations of the IMMAT approach. 

This thesis addresses the problem of testing and evaluation of an IMMAT algorithm using 
two sets of video data collected by manned and unmanned aerial vehicles equipped with a 
representative sensor. 

The work within this thesis spans the domains of computer vision, systems engineering and 
unmanned aerial vehicle navigation. The broad intent of this investigation is to develop new 
techniques using onboard image stream or video, processing those images with the intention 
to characterize the motion of autonomous aerial vehicles so as to support navigational tasks. 

This effort contributes towards autonomous operations within a GPS denied environment, 
and the objectives are aligned with the United States Department of Defense (DOD)’s 
Unmanned Systems Integrated Roadmap FY201I-2036 [6]. 

In order to better develop algorithm for the image-matching navigation task, this research 
conducted functional analysis [7] as guided by systems engineering best practices. This 
analysis enables us to gain greater insight into how to divide the task according to different 
algorithmic procedures. A high-level schematic of the Image Navigation task is depicted 
in Figure 1.3. The sub-functions are labeled individually, and a description of each is 
presented in Table 1.1. The functional decomposition helps subsequently by structuring 
the implementation of an IMMAT algorithm that is described in the rest of the thesis. 


3 



Table 1.1. Functional decomposition for the image-matching navigation 
task. 


Label 

Function 

Description 

F.O 

Image 

Matching 

Navigation 

Broadly, the Image-based Matching Navigation task is about making 
navigational decisions relying on reference images of an operating 
environment that have reliable location information tagged to it. From an 
unknown location, pictures or images of the area are taken and then compared 
with the available reference images. 

F.l 

Manage 

Ground Truth 

Reference 

Imagery 

A means to manage a repository of methodically organized images is needed 
to facilitate the image-matching task efficiently. The library shall contain 
ground-truth information such as latitude and longitude (or other equivalent 
location referencing mechanism) of the scene. The library must be able to be 
updated with appropriate reference frames. 

F.1.1 

Retrieving 

Geo- 

Referenced 

Imagery 

An appropriate external source of retrievable high-quality geo-referenced 
imagery is needed, appropriate for the area of operations. Geo-referencing 
information needs to contain latitude and longitude. Having additional 
information such as the elevation of the ground at that point can also be useful. 

F.1.2 

Create 

Reference 

Images 

Library 

Using the geo-referenced imagery, the user needs a method to generate a 
number of reference image frames according to a planned trajectory, such that 
when the UAV flies over the planned path, the scenery can be matched with 
these references to derive the aerial position and pose. 

F.2 

Flight 

Trajectory 

Planning 

To select and create appropriate reference frames to be stored for cross- 
referencing, a means of path planning is necessary. The planned path will 
provide critical information such as latitude, longitude and altitude of UAV, 
the camera view point of the on-board sensor, as well as the underlying terrain 
height. 

F.2.1 

Determine 

Start and End 
points of 

Flight 

There shall be a means for defining the starting and ending points of a flight. 

F.2.2 

Determine 
Nominal Pose 
of Camera 

There shall be a means for defining the nominal roll, pitch and yaw of the 
camera, that is, what the camera is looking at during the flight. 

F.3 

Estimating 
Position and 
Pose of 

Camera in¬ 
flight 

This is the core function of the image-matching navigation task - to use 
nominal trajectory information together with the reference library and the 
incoming sensor stream to produce an estimation of the in-flight camera pose 
and location. 

F.3.1 

Matching 

Video Frames 
to Reference 
Images 

This sub-function finds the mathematical transform that would map the 
incoming video frames to an appropriate geo-referenced image frame and in 
so doing can produce the first positional estimate for the camera. 

F.3.2 

Perform 
Optimization 
of Roll, Pitch, 
Yaw and 
Positional 
Estimates 

After having the rough position of the camera, this sub-function works to 
reduce the amount of error within the initial estimate for all six degrees of 
freedom - that is the 3-dimensional position as well as the roll, pitch and yaw 
of the camera. 


4 





Figure 1.3. Image-matching navigation functional decomposition. 


1.3 Organization of the Thesis 

To address the problem formulated in Seetion 1.2, the remainder of this thesis is organized 

as follows, 

Chapter 2 presents a review of existing literature doeumenting work previously done 
within the domain of the thesis. The ehapter also summarizes relevant eoneepts 
sueh as the applieability of satellite imagery and digital elevation models, and the 
way these will be used to provide aeeurate geo-refereneed images against whieh the 
environment and the data ean be refereneed and then modeled. 

Chapter 3 presents the implementation details of the algorithms used for image matehing. 

Chapter 4 presents the datasets and data eolleetion proeess used for this projeet. This 
ehapter also provides a deseription of the physieal system used to oolleet the flight 
data for analysis. The results are then analyzed and diseussed. 

Chapter 5 provides the eoneluding remarks about the researeh deseribed in this thesis. 


5 



























The wider implications of the results on future work and what research still remains 
to be done are also discussed. 

Appendix A provides a full listing of all the meta-data that is made available by the camera 
used for this thesis. 

Appendix B provides the meta-data details for the satellite imagery used for this thesis. 

Appendix C provides a schematic and workflow of the MATLAB codes that were written 
for this thesis. 


6 



CHAPTER 2: 
Relevant Concepts 


This chapter presents various relevant concepts for the subject of this thesis. Concepts such 
as the reference frames for a UAV set against the world coordinates, image feature extraction 
algorithms, and the Kalman filter are also introduced. 

2.1 Overview of Computer Vision Navigational Tech¬ 
niques 

A large body of work is available pertaining to attitude estimation using various sensor 
inputs [8]. Sensors relied upon are variously the inertial navigation system, the on-board 
accelerometers, magnetometer, and most commonly today, the GPS. The focus of this thesis 
is on the use of the video stream that is available for most UAVs. The rest of this section 
presents a review of work done within the computer vision domain for attitude estimation 
of UAVs. 

Mondragon et al. [11] proposed to use an omnidirectional sensor to identify a skyline and 
use it for attitude and heading estimation, noting that this system can be used as a redundant 
system for the INS and gyro-sensors. The omnidirectional sensor used in their research 
was a catadioptric video camera. Figure 2.1 shows a hyperbolic reflector capturing an 
omnidirectional view of the surrounding environment; examples of the sensors themselves 
are shown in Figure 2.2. 

Their approach requires the image contain the horizon-line from which their proposed algo¬ 
rithm segments the image to find the horizon. The detected skyline is then mathematically 
modeled as an occluding contour of the Earth as a plane inside a unit sphere, where the 
horizon forms a red line as the intersection of the plane of the Earth with the sphere (see 
Eigure 2.3). The normal to the modeled plane provides a basis to estimate the pitch and roll. 
The yaw is estimated by checking registration of visual objects as they shift from frame to 
frame. 

Kong et al. [2] used a feature-based navigation technique that essentially works by comparing 


7 





Figure 2.1. A hyperbolic reflector of a catadioptric sensor capturing an 
omnidirectional view of the surroundings. Source: [9]. 

features of an image with a previously taken set of referenee images that are labeled with 
GPS data. The images taken by the onboard eamera need to be mathematieally transformed 
into the same plane as the referenee images and then by feature matehing. In their study, 
Kong et al. proposed using features that are as far as possible invariant under different 
lighting eonditions. Their algorithm used edges extraeted by the “Canny Edge Deteetor." 
The number of features extraeted was deliberately kept small to reduee mismateh rates. A 
Gaussian blur filter was applied to reduee the number of unwanted features and smooth the 
edges extraeted. To ealculate the UAV’s position, the algorithm eomputed the eentroid of 
a feature known to exist on the referenee image (in world eoordinates) and the image taken 
by the onboard eamera. The motion eould then be dedueed by eomputing the translation 
veetor between them. The authors eoneluded that there are limitations on matching natural 
features. Also, the authors proposed as a next step to aeeelerate the eomputation by moving 
it onto a Field Programmable Gate Array (FPGA) as the algorithm is floating point intensive 


8 




Figure 2.2. Examples of catadioptric sensors. Source: [10]. 


and highly repetitive, whieh ean benefit greatly from hardware aeeeleration. As this thesis 
also works on matehing natural terrain features, any limitations in natural feature matehing 
will also be noted. 

Yakimenko and Deeker [3] proposed using high-resolution satellite images with IMMAT 
algorithms to tune the position and attitude of a UAV. The proposed approaeh utilized 
the IMMAT algorithm to mateh a eamera position to a geo-refereneed satellite image. 
Broadly deseribed, the eoneept is to optimize the loeation estimate of the features of the 
real-flight image on the satellite image using a feature deteetion algorithm. Further details 
of this approaeh are given in the next seetion of this thesis, whieh extends the preliminary 
work previously done and deseribed in the reviewed literature. This work promotes the 
understanding of the effeet of operations at various altitudes, and where possible, to improve 
on the aeeuraey of the teehnique. 

2.2 Satellite Imagery and Digital Elevation Map 

For a souree of geo-refereneed imagery, the use of the DigitalGlobe satellite imagery is 
introdueed. Then, as the satellite imagery does not eontain elevation information, elevation 
information assoeiated with the area of operations is supplemented with digital elevation 
map of the terrain from the Advaneed Spaeeborne Thermal Emission and Refleetion Ra¬ 
diometer (ASTER) Global Digital Elevation Model database. 


9 




2.2.1 Satellite Imagery 

Satellite imagery geo-referenced to the latitude and longitude, has been used to provide 
ground truth. Yakimenko and Decker [3] earlier recommended the use of the geospatial 
data provided by DigitalGlobe as it was the most accurate library of the Earth. As such, for 
this thesis high-resolution satellite imagery was retrieved from the DigitalGlobe website [12] 
for both Camp Roberts and King City, California. 

Digital Globe’s geospatial big data (GBDX) platform provides access to 15 years’ worth of 
geospatial data along with the tools and algorithms necessary to extract useful information 
from that repository. 

The high resolution satellite images of the area of interest can be made by selecting the 
desired image layers and then creating a mash-up image. This image collage can then be 
downloaded as high-resolution tiles that can be stitched together to form a large contiguous 
image of the area of interest. Each pixel within these high-resolution tiles represents a 


10 



Table 2.1. Characteristics of the ASTER digital elevation model. 


Tile Size 

3601 x3601 (1 X 1*) 

Pixel Size 

1 arc-second 

Geographic Coordinate 
System 

Geographic latitude and longitude 

DEM Output Format 

GeoTIFF, signed 16-bit, in units of vertical meters 

Referenced to the WGS84/EGM96 geoid 

Special DN Values 

-9999 for void pixels, and 0 for sea water body 

Coverage 

North 83 to South 83 , 22,702 tiles 


half-meter by half-meter square on the ground. It is from the stitehed high-resolution image 
that referenee images will be ereated for the Referenee Image Library. The details of the 
referenee image ereation are presented in Chapter 3. 

2.2.2 Digital Elevation Map 

High quality, geo-refereneed terrain elevation data is required in order to model the effeets 
of the underlying terrain. 

For the purposes of this thesis, the terrain models of the operating areas were retrieved 
from the ASTER Global Digital Elevation Model (DEM) Version 2 database, hosted by the 
United States Geologieal Survey (USGS) (https://www.usgs.gov/). The data is open-souree 
and publicly downloadable. A DEM is essentially gridded data where each square in the 
grid corresponds to a geographic location, holding a value that represents the elevation 
above mean sea level. In the case of the ASTER DEM, the data was stored as a gridded 
(latitude, longitude, elevation) matrix within a geoTIEE file. 

To download the relevant digital elevation model, we entered the bounding latitudes and 
longitudes into the USGS EarthExplorer system (https://earthexplorer.usgs.gov/) and se¬ 
lected the ASTER database. The system then made available the appropriate data package 
for download. The data is retrieved as a geo-referenced TIEE file with 16-bit information of 
vertical meters, where each pixel represents 1 arc-second by 1 arc-second (approximately 


11 




30m by 30m near the equator) in geographie latitude and longitude. This data is eaptured by 
the National Aeronauties and Spaee Administration(NASA) Terra spaeeeraft’s infra-red (IR) 
eameras with a 20-meter elevation aeouraey at 95% eonfidenee interval. The information 
within the DEM is used to set the elevation of the ground to provide for better re-projeetion 
for the ereation of the Referenee Image Library. Details of the DEM model used in this 
thesis are eaptured in Table 2.1. 

2.3 Reference Conventions 

Eor the algorithms to work, a eonsistent set of referenee frames must be used to properly 
deseribe the orientation of an aireraft in three-dimensions around its own eenter-of-gravity, 
as well as for refereneing its position within the world eoordinates. 

This seetion lays out the refereneing eonventions used within this thesis. The first part 
introduees the world eoordinate referenee frame, with whieh the unambiguous loeation of 
the UAV ean be deseribed. Eollowing that, the eonvention for deseribing the attitude of the 
UAV is deseribed. 

2.3.1 UAV Body Frame 

The UAV body frame of reference is body-fixed. It is fixed upon the center of gravity of 
the UAV. The convention used within this thesis has the -l-Z pointing out of the bottom of 
the UAV, -l-A out of the nose, and -1-7 in the direction of the right wing, in other words, 
X = north, y = east, z = down (See Figure 2.4, where the diagram depicts the world frame 
of reference in Latitude, Longitude and Up, in which targets and the platform physical 
location will be located with. In the air, the UAV is illustrated using a North-East-Down 
convention body-centered frame-fixed reference axes). Although it appears counter-intuitive 
to use a coordinate axes that is oriented differently to the world frames, the advantage of 
using this reference frame for the UAV allows for easier mathematical transformations when 
computing rotations and translations with respect to the ground. 

2.3.2 Universal Transverse Mercator Coordinate System 

The work within this thesis is primarily about estimating the location of a UAV with 
respect to the Earth’s surface. To make this estimation, there is a need to unambiguously 


12 




Figure 2.4. World and UAV frames of reference. 


reference a location on the surface of the Earth. This thesis uses the Universal Transverse 
Mercator (UTM) Coordinate System to identify locations on the surface of the Earth as the 
units correspond to meters on the ground. This method greatly simplifies the computation 
of distances in three dimensions. Eurther, 3DEM provides the capability to convert any 
terrain using Geodetic (latitude-longitude) projection into a UTM projection. Terrain data 
sources such as the NASA SRTM data and the National Elevation Dataset are provided in a 
geodetic latitude-longitude projection. The disadvantage of the geodetic projection is that 
it introduces an east-west distortion at high latitudes. The UTM projection corrects this 
distortion, providing a more realistic map view and 3D scene. An added benefit is that 
using the UTM projection is helpful in the application of terrain overlays. 

2.3.3 Image Features Extraction 

Eor the purposes of this thesis, image features are data found within either the satellite images 
or the sensor video frames that are relevant to solving the proposed image-matching problem. 


13 











Many image feature extraction algorithms have been developed, for example, Speeded Up 
Robust Features (SURF), Binary Robust Invariant Scalable Keypoints (BRISK), Lowe’s [13] 
Scale-Invariant Feature Transform (SIFT). 

Although Lowe’s SIFT algorithm is effective in situations where image features are invariant 
even when common image transformations are applied, the effectiveness comes at the 
expense of computational cost (i.e., it is slow) [14]. By contrast, SURF was described 
in 2006 by Bay et al. [15] and was demonstrated to be significantly faster than SIFT, and 
thus, suitable for the purposes of this thesis. Similarly, BRISK [14] is another plausible 
alternative that is rotation and scale invariant. It is also suitable for matching up feature 
sets that are likely to be the transformations of those image features, but that method is not 
explored within the scope of this thesis and is left for future work. 

2.4 Random Sample Consensus Algorithm for Outliers 

In this work, features from images are extracted and then a matching correspondence 
between the most features in two similar images is estimated. This matching may contain 
outliers and do not accurately describe how the features match up with each other in the 
two images. To exclude spurious matchings, the estimated correspondence ran through a 
Random Sample Consensus (RANSAC) algorithm. The RANSAC algorithm, first described 
in 1981 by Fischler et al. [ 16] seeks to find a consensus set of inliers that can best explain the 
match between two images. Briefly, the RANSAC algorithm steps through the following to 
produce a model to fit the data, assuming the model has a parameters vector X: 

1. Select a subset of N out of M data points at random 

2. Hypothesis generation step: use the selected N points to estimate X 

3. Hypothesis verification step: count the number data points in M fits the model within 
a configurable tolerance. Call the proportion of data points fitting the model p. 

4. if p is sufficiently good, exit RANSAC algorithm and flag success. 

5. Otherwise, go back to step 1 and repeat for Q times. 

6. Exit after Q trials, flag failure - unable to find a model that adequately explains the 
data. 

This thesis uses a variant of the RANSAC algorithm, which is called the M-estimator 
SAmple and Consensus (MSAC) algorithm. The MSAC algorithm uses optimization to 


14 



speed up convergence; a detailed evaluation of all the RANSAC variants was conducted by 
Choi et al. in 1997 [17], where the differences in the variants are detailed. 


2.5 Estimating System State Using Kalman Filters 

If the IMMAT navigation approach were treated as a measurement process of a UAV’s 
position and attitude within the environment, then the output would contain noise and have 
uncertainty within each observation. Further, there could be omissions from the output of 
the image-matching algorithm should inadequate matches be found. In such instances, one 
approach to infer parameters or system states of interest such as position and attitude from 
the jumpy output is the Kalman filter [18]. 

Broadly explained, a Kalman filter aims to minimize the mean square error of the parameters, 
assuming the noise in the measured data is Gaussian. Kalman filters are widely used in the 
military context to track targets by radar, for example. The Kalman filter is used to filter the 
outputs of the IMMAT algorithm to suppress the Gaussian noise. 


15 



THIS PAGE INTENTIONALLY LEET BLANK 


16 



CHAPTER 3: 
Image-Matching Paradigm 


This chapter provides IMMAT implementation details within the MATLAB environment 
to take advantage of image processing toolkits, efficient matrix-based operations and the 
inbuilt-optimization algorithms. The algorithm developed will be used to estimate the 
UAV’s position and attitude. 

3.1 IMMAT System Architecture 

The overall IMMAT navigation concept as proposed by Yakimenko and Decker [3] is 
presented graphically in Figure 3.1. As depicted, the IMMAT task is executed in several 
stages. The workflow as illustrated in the diagram is elaborated in the ensuing sections, 
and it matches up with the functional decomposition that was conducted in Chapter 1. The 
image-matching task is executed in several stages. Broadly, the concept steps through two 
main phases: the planning phase and the real-time operations phase. The planning phase 
contains all the steps leading up to the generation of the Reference Image Library, while the 
real-time operations phase contains all the steps after. 

3.2 Generating the Reference Image Library 

All steps up to and including the generation of the Reference Image Library are done in 
the planning phase. The planning phase involves tasks and activities that can be done 
ahead of time, preferably off-line, in preparation of the real-time phase. Some stages can 
be performed off-line (which in the case of a UAV, means pre-flight) as those tasks can be 
planned and prepared ahead of time, and do not require real-time processing on-board the 
UAV. One such step that is suitably performed off-line is the (1) planning of an anticipated 
trajectory of the UAV and then (2) generation of the Reference Image Library (RIL), which 
will be used by the UAV in-flight. For this phase, there is a need for an a-priori nominal 
trajectory, which is a limitation of this approach. 


17 





Figure 3.1. Schematic of the Image-matching algorithm workflow. Source: 

[3] 


3.2.1 Planning Nominal Trajectory 

Assuming that the UAV has been assigned a mission within a known area of operations, 
an operator can roughly plan a trajectory the UAV is expected to follow. This planned 
trajectory is termed the nominal trajectory for the flight. 

For the purposes of this thesis, the nominal trajectories used for the research were created 
from real flight profiles (that are presented in detail within the next chapter) as the nominal 
trajectories have accompanying ground truth information available for further analysis. 

The nominal trajectory for this research was created from raw flight data. First, a 25-period 
running average of the data points was used to address aliasing effects due to repeated data 
points. Then to smooth the planned trajectory, we fit a polynomial to the data. This is 


18 





























described by the following block of MATLAB pseudo-code: 

NSmooth=25; 

NominalTraj ectory.t=smooth(RawTraj ectory.t,NSmooth); 

NominalTraj ectory.East_m=smooth(RawTraj ectory.East_m,NSmooth); 
NominalTraj ectory.North_m=smooth(RawTraj ectory.North_m,NSmooth); 
NominalTraj ectory.Lat^smoothfRawTraj ectory.Eat,NSmooth); 

NominalTraj ectory.Lon^smoothfRawTraj ectory.Lon,NSmooth); 

NominalTraj ectory.Roll_deg=smooth(RawTraj ectory.Roll_deg,NSmooth); 
NominalTraj ectory.Pitch_deg=smooth(RawTraj ectory.Pitch_deg,NSmooth); 
NominalTraj ectory.Yaw_deg=smooth(RawTraj ectory.Yaw_deg,NSmooth); 


The in-flight phase (which is described in detail in Section 3.3) relies on a repository of 
images against which an IMMAT algorithm compares incoming video frames from the 
onboard UAV sensor to estimate the pose of the vehicle. To build this Reference Image 
Library (RIL), satellite imagery of the known area of operations is retrieved, and then 
geo-referenced in the UTM coordinate system (which was discussed previously in Section 
2.3.2), 

Section 2.2 presented on reference sources of geo-referenced imagery. This section details 
how a reference image is created from the notional position and attitude of a Unmanned 
Aerial Vehicle (UAV) following the planned nominal trajectory. 

The nominal trajectory as described in the previous section (section 3.2.1) is divided into 
N points. At each of those points, a series of high resolution images is extracted from 
the satellite images along the planned path the UAV is expected to take. Extraction is 
accomplished by using the nominal camera pose at those positions used to generate the 
Reference Images. Figure 3.2 shows a nominal trajectory superimposed on the raw track 
data that was collected from an actual UAV flight (actual flight collection is presented in 
Chapter 4). The nominal trajectory is divided into 35 segments in this example, where at 
each of those points a reference image will be generated. The pseudo-code used to generate 
positions for the reference images follows: 

timeVector = [1:numel(range)]’; 


19 



= North (Y indexes descending) 


Scenery Image (XY) Coordinates 



X = East (Positive to the right) 

Figure 3.2. Creating a nominal trajectory. 

trajectory.X_fit = fit(timeVector, trajectory.XY_rawC:,1), ’poly9’) ; 
trajectory.Y_fit = fit(timeVector, trajectory.XY_raw(;,2), ’poly9’) ; 
trajectory.XY_fitted = 

[traj ectory.X_fit(timeVector),traj ectory.Y_fit(timeVector)]; 

ref ImagesTime=linspace(0,numel (range),35+1); 
refImagesTime = refImagesTime(l:35); 

traj ectory.RefIraagesXYPosition = 

[traj ectory.X_fit(refImagesTime),traj ectory.Y_fit(refImagesTime)]; 


20 



After generating the positions for the reference images, the roll of the camera is set to zero, 
while the pitch and yaw follows those of the nominal trajectory. With the three-dimensional 
position, roll, pitch and yaw of the an imaginary camera, the four corners of the field-of-view 
of the camera is projected from that position to the ground plane. Where the projection 
intersects with the ground is a trapezium patch which will be cropped and warped into the 
camera’s view. 

3.2.2 Creating a Reference Image 

With the position and attitude information of a camera following the nominal trajectory, the 
center-point of the camera’s viewpoint is projected to the ground map, along with the four 
corners of the camera’s field-of-view. The high-resolution map is then cropped to the area 
enclosed by the four corners and projectively transformed into a rectangular view. This 
represents a notional scene of what an onboard camera might see during a fly-pass (Figure 
3.3). It is essential that the four corners of the camera view can be projected onto the ground 



and not contain the horizon for this algorithm to work due to the basic, two-dimensional 
reference image generation scheme employed at this time. 


21 













During this stage when the reference image is created, the projective transform B, A 3 X 3 for 
mapping a pixel within the reference image {hef) to the real world coordinates {1^™) is 
computed and stored with the reference image in the Reference Image Library. The equation 
that relates the image {u, v) pixel to the real world coordinates {xutm, Hutm) is provided by 


tUTM _ X 

Iref - ^3x3^ 

ref 

Xutm 


U 

yuTM 

= A3x3 

V 

w 


1 


(3.1) 


where A 3 X 3 is the transformation matrix and w is the scaling variable. 

The generated reference images are stored together with the location and view perspective 
of the camera as the geo-referencing information within the RIL. 

To further take advantage of the pre-planning phase, computationally hungry image- 
processing tasks such as feature extraction can be conducted on the reference view images 
(RVIs) and then storing the extracted features with the RVIs in the RIL before loading it on 
the UAV. This reduces the number of computational cycles onboard the UAV. 

The implicit assumptions for the algorithm to work are the following: 

1. The terrain as viewed from the UAV’s camera can be adequately represented with a 
re-projected satellite view of the terrain; 

2. Sufficient feature matches must be found between the RVI and the camera image to 
establish the transform between the two images; 

3. The images coming through the sensor need to be downward-looking, so that the 
four corners of the sensor’s field-of-view always intersect the horizon plane. The 
algorithm will fail as long as any one of the corners is projected above the horizon. 

Once the RIL is generated, it should be stored onboard the UAV prior to mission deployment 
so that pose estimates during actual flight can be obtained. 


22 



3.3 In-Flight Phase 

After the RIL is created, it can then be loaded onto the UAV so that it is available for in-flight 
use. The rest of this section describes how the reference frames within the RIL are used by 
an image-matching algorithm to estimate the location and the pose of the UAV. 

The overarching idea for the image-matching algorithm is in two main stages. The first stage 
is to find a geometric transformation that is able to relate all pixels in the TASE camera’s 
image to their geographic location. The second stage is to estimate the location and attitude 
of a would-be camera in space that would be able to create such a footprint of the features 
found in the TASE image on the ground. 

3.3.1 Finding Matching Features in Reference Image and Camera Im¬ 
age 

The “closest” corresponding reference frame within the RIE is selected by using information 
that may be available such as the last known coordinates of the UAV and then extrapolated 
by time along the heading it was previously taking. 

After the appropriate reference frame has been selected, the features for that reference frame 
are matched with the features extracted from the onboard sensor image. 

When attempting to match the reference image to the camera image, one may encounter 
three possible outcomes: (1) the scenes overlap and there are sufficient matches between 
the reference frame and camera image; (2) although the scenes match, there are insufficient 
matches between the reference frame and the camera image, and (3) no match is found 
between the reference frame and the camera image because the scenes within the images do 
not overlap. These three possibilities are elaborated each in turn, with graphical examples 
provided. Known situations where IMMAT drops might occur due to (2) and (3) are also 
be highlighted. 

Sufficient Matches Between Reference Frame and Camera Image In this situation, the 
nominal trajectory provided usefully accurate position and pose for the reference image 
generation algorithm to capture the appropriate scene having a view that overlaps with the 
camera image. Eurthermore, the reference image and the camera image are sufficiently 
feature rich that after the MS AC algorithm is run to remove outliers, there are sufficient 


23 



matching points for estimating a mapping transformation that maps camera image pixel 
position to real world coordinates. 

Insufficient Matches Between Reference Frame and Camera Image In this situation, 
although the nominal trajectory provided a sufficiently accurate position and pose for a 
camera to generate a reference frame that overlaps in view with the camera image, there are 
inadequate inlier matches between the reference frame and the camera image to produce an 
estimate for the transformation that maps the camera image pixels to real world coordinates. 
In this situation, it would constitute an IMMAT drop. One example where insufficient 
correspondences were found is given in Figure 3.4. In that figure, while the Reference 
Frame scene matches the camera image, there are insufficient correspondences found after 
MSAC. For the top pair, green markers show the top few strongest SURF features that 
were extracted. The bottom pair shows the remaining features after RANSAC outlier 
culling, which is inadequate for estimating a geometric transform.. Another example 
attributable to a different reason where correspondences cannot be found is given in Figure 
3.5. In this instance, the camera was experiencing vibrations and therefore insufficient inlier 
correspondences could be found between the reference images and the camera images. 

No Matches between Reference Frame and Camera Image In this situation, the nominal 
trajectory location and pose used to generate the reference image do not overlap with the 
camera image. This could be due to perturbations in the flight profile or deviations from 
the flight profile that caused the camera not to view a scene that was expected to be viewed. 
This situation, like the previous case, would constitute an IMMAT drop. An example of this 
is found in Figure 3.6. For this example, the camera image was unable to match with the 
Reference Image as the IMMAT procedure had just switched to using the next Reference 
Image according to the nominal trajectory location prediction causing a mismatch between 
the scenes. 

3.3.2 Finding UTM Coordinates of Matched Features by Optimization 

Information available at this stage of the problem is (1) the live camera video frame (2) an 
appropriately selected reference image from the RIL, and (3) the parameters of the camera, 
such as the FOV and focal lengths in both horizontal and vertical directions. 

The reference frame from the RIL is accompanied by geo-referencing information. The 


24 




Figure 3.4. InsufFicient matches. 


reference frame itself had previously been warped based on the information available in the 
nominal trajectory, that is, the three-dimensional spatial position of the drone, as well as 
the viewing direction of the camera. 

Assuming that sufficient features were found in the previous stage from extracting SURF 
features in both reference frame and the camera image (see Figure 3.7), these features will 
need to be paired in the next step. 

An MSAC algorithm is used to sift through the features and find the best pairings between 
the features of both the video frame and the reference frame, discarding pairings that 
fall below a user configurable threshold. With the pairings, a two-dimensional projective 
transformation can be computed that maps the video frame view to the reference frame 
perspective. 

Using the matched inliers between the reference frame and the camera image, a perspective 
transform is computed between the onboard sensor’s X - Y coordinates and the reference 


25 































so 

100 

ra 150 
£ 

I 200 

"S 

8 300 
4) 

(£ 

350 

400 

450 




frame’s coordinates. 

The matched pairs of features are then used to estimate a projective transform 63x3 that 
maps features in the reference image {Iref) to the features in the camera image ( 7 '^'^'”). 


jcam 


= B3x34e/ 


( 3 . 2 ) 


Accounting for Equation 3.1, 


jUTM 


= A3X3B 


-1 

3x3 


( 3 . 3 ) 


26 






100 

200 


0! 



700 


BOO 

900 



200 400 600 SOO 1000 1200 


Camera {mage 771 of 1783 



100 200 300 400 500 600 


Figure 3.6. Example where the Reference Frame scene does not match with 
Camera Image. 



Figure 3.7. A coarse correspondence is found between features in the Ref¬ 
erence Frame and Camera Image before MSAC outlier exclusion. 


As the initial matching may contain outliers, it is run through the MSAC algorithm to 
remove outliers. Figure 3.8 shows the corresponding features after outliers had been culled 
by the MSAC algorithm. 

Until this stage, (1) the SURF features for both the appropriate Reference Image and the 
camera image are extracted, (2) a rough correspondence match between the two images are 
found, and then (3) the outliers in the matching are removed by the MSAC algorithm. Figure 
3.9 shows the result of a sample full run of a trajectory through the feature extraction and 
outlier culling process. The corresponding reference image and camera image each starts off 
numerous features (shown in the top sub-plot), which after an coarse matching significantly 
reduces to the order of tens (magenta plot). After the RANS AC/MS AC procedure for this 
example, on the order of about 5 -10 points are left which are used in the next stage to 
estimate the projective transform to find the UTM coordinates of those features. 


27 



















Figure 3.8. A montage of reference and camera image after MSAC outlier 
culling. 


It is essential that the eamera must be downward-faeing so that the four eorners of the 
eamera’s field of view ean be projeeted onto the horizon plane. This approaeh fails when 
one of the projeeted eorners lie above the horizon plane. 

Using that relationship (transform), it is possible to projeet the matehed points onto the 
ground. An optimization proeedure is then exeeuted to minimize the errors related to where 
the eamera would have been in order to observe the points projeeted onto the ground in that 
way. In so doing, the hypothesis is that it is able to provide a reasonable estimate of where 
the eamera was (position and orientation) when the image was reeorded. 

Estimating the Position and Attitude of the Camera. In the previous phase, aetual 
geographieal loeations were identified for features within the TASE eamera image. In this 
phase, the question at hand is to estimate the position and the attitude of the eamera that 
would best mateh the same projeeted view on the ground. In other words, what are the best 
estimates that ean be found from an estimated position and attitude in spaee for a would-be 
eamera to allow the features of the eamera frame to be projeeted onto the ground in UTM 
eoordinates that maximizes the overlap with those points. 

In diagrammatie form. Figure 3.10 shows the UTM eoordinate positions of four features 
(illustrated for simplieity as the four eorners of the image) that were established in the earlier 
phase. From an estimated position and the attitude of the UAV, the features inside the eamera 
frame as extraeted earlier are projeeted onto the ground generating guessi positions, where i 
indexes eaeh matehed feature. These projeeted features should at this point of the proeedure 


28 









o- Number of Matches 
RANSAC Inller Matches 



0 10 20 30 40 50 60 70 80 90 100 

Time, s 


Figure 3.9. Feature extraction and outlier culling. 


be close to the observed positions of the features. 

The deviations are given by 

A/ = f eatures.positiorii - guess.positioni 
The cost function for this problem is defined to be the sum of squared deviations 

ErrorSumOf Squares = ^J|A;||^ 

i 

Minimizing the error as computed by the cost function will produce the best estimate (within 
a configurable tolerance) for the location and attitude of the UAV.The reduction in the error 
between the observed value and the estimated value is done by an optimization algorithm. 

During the concept exploration phase, an earlier implementation of the estimation process 
used an unconstrained optimization algorithm on the estimate of the UAV’s position and 


(3.4) 


(3.5) 


29 



















Figure 3.10. Projecting camera frame features (illustrated as corners for 
simplicity) onto ground in UTM coordinates from an initial estimated state 
[Easting, Northing, Up, Roll, Pitch, Yaw]. 


attitude. As a lot of information is known about the possible pose and loeation of a UAV 
given a planned trajectory, this information should be able to constrain the possible position 
of the would-be camera in space. In order to evaluate the efficiency and accuracy of 
constrained versus unconstrained optimization for the purposes of estimating the attitude 
and pose of a camera, a script was written to project the four corners of a viewfinder to the 
ground at a known position and pose (Figure 3.11). 

For both the constrained and unconstrained search, the same initial estimate for position 
and pose of the camera were used. In the constrained search, the search was bounded to 
within 500m accuracy for position, and 30°, which approximates the bounds that will be set 
using interpolated position of the UAV in air as well as nominal roll, pitch and yaw of the 
UAV at that point in time based on the nominal trajectory. 

A sample run from within the MATLAB environment using fminsearch and then the 
same problem done with fmincon is shown in Figure 3.12. The unconstrained search for 
an optimum took significantly longer to converge, taking close to 800 iterations versus 29 


30 











Figure 3.11. Viewfinder corner projection to ground. 

iterations for the eonstrained seareh. The uneonstrained seareh ended at a final error funetion 
value that is higher (i.e., worse) when eompared to the eonstrained optimization solution 
(error funetion for uneonstrained seareh was 226 versus eomputer zero for eonstrained 
seareh). Further runs eonfirmed that the eonstrained seareh found the solutions mueh faster 
and more aeeurately. 

For the IMMAT algorithm, the eonstrained optimization algorithm is initialized with the 
orientation and the extrapolated position of where the UAV might have been if it were 
following the nominal trajeetory. The boundaries were set within lower and upper bounds 
of nominal value +30° for roll, piteh and yaw, and within nominal value ±500m for Easting 
and Northing. Boundaries were set for altitude nominal value ±500m and 50m the lowest 
flying trajeetory whieh was 150m. 

3.3.3 Pitfalls of Not Having Good Data 

For the IMMAT algorithm to sueeessfully estimate a projeetive transformation between the 
referenee image and the eamera image, a minimum of five eontrol-point pairs are required. 
Using that as a filtering eriteria where an estimate is eonsidered valid only when there are 
five or more eontrol-point pairs and running the algorithm on a representative trajeetory 


31 



Current Point 



I" Pau! 


Iteration 

Current Point 


1000 • 

800 ■ 

eooi? 

E 400 ■ 
d) 

I 20“- 

ra 

u 

0 • 

-200 • 
-400 • 
-600 • 




patch 

tU? observed 
+ calculated 


True: 0 0 -2000 -10 0 15 

Found: 6 -11 -2020 -10 0 1 5 


Tir 


Unconstrained Optimization 


2000 2500 

Northing, m 



1000 • 
800 • 
eooir 

E 400 • 

S 200 - 
(0 

^ 0 - 
-200 ■ 
-400 ■ 
-600 • 


patch 

^ observed 
+ calcul^ed 


True; 0 0 -2000 -10 0 15 

Found; -0 0 -2000 -10 -0 1 5 


ir 


Constrained Optimization 


2500 

Northing, m 


Figure 3.12. Unconstrained versus constrained optimization for estimating 
UAV position and attitude. 


(as an example, taking one from Camp Roberts), then the observed drop rates are relatively 
high. Figure 3.13 shows the output of a trajeetory through the IMMAT algorithm. 

For Camp Roberts, it was observed that even though the number of features extraeted 
from the reference image and the camera image numbered in the thousands, after coarse 
correspondence (see the magenta plot in the second block of Figure 3.13), the number 
of matches drops significantly to the order of tens. After RANSAC is performed, there 
is a further reduction, with few remaining points that meet the five-or-more requirement 
for estimating a projective transformation. Figure 3.14 shows the distribution of control 
points that was generated by the IMMAT algorithm, and Figure 3.15 presents the data as 
a cumulative distribution. As can be seen, about 20% of the entire trajectory produces 


32 




































CO 



=«: 0 20 40 60 80 100 


</3 

CD 

O 

ro 

E 

4— 

O 

4fc 


w 

CD 

O 
-*—* 
CD 

E 


tj 

CD 

O 

o 

4fc 



o Number of matches 
• FCANSAC inlier matches 
- Reference image number 


100 



sufficient control points that can then be used to estimate a projective transformation. 

Even though the number of data points that can be used to estimate the location and attitude 
of a UAV is not high, it is still possible to produce useful estimates by feeding the outcomes 
of the IMMAT algorithm to a Kalman filter (presented in the following section) which has 
built-in predictive capabilities. To this end, Tables 3.1 and 3.2 shows two sample outputs for 
the hVlMAT estimates for location and attitude as an illustration of when there are adequate 
matching pairs versus when there are insufficient matching pairs. As seen by looking at the 
error columns, when there are sufficient matches found, the IMMAT algorithm performs 
relatively well. By contrast, when the number of matches is below five and the project 
transformation cannot be computed, there is significant degradation of performance in the 
IMMAT estimates. 

Apart from studying the magnitude of errors during the estimation phase, the effect on 


33 






































600 


500 

400 

(0 

I 300 

O 

200 

100 

0 

Figure 3.14. Distribution of control-point pairs. 



0 1 2 3 4 > 5 

# of matched inliers 



>0 ■'1 >2 >3 >4 >5 

#of control points 


Figure 3.15. Cumulative histogram of control-point pairs. 

convergence produced by suffieient versus insufficient eontrol points was also studied. The 
rate of convergenee for a seven eontrol-point match is shown in Figure 3.16, while the rate 
of eonvergenee for two matehes is shown in Figure 3.17. With seven points, it was possible 
to eompute a projeetive transform, yielding useful estimates that were able to eonverge, 
giving a final cost function value of around 11. In the other case, the number of eontrol 
points used was insuffieient to estimate a projeetive transform, producing estimates that 
were in faet spurious, and the optimization procedure took more iterations and ended at a 
eost funetion value that was higher. 


34 









Table 3.1. Seven control-points estimate errors. 



Estimated 

Truth 

Error 

Easting, m 

699660 

699654 

-6 

Northing, m 

3956166 

3956116 

-50 

Up, m 

186 

161 

-25 

Roll, ° 

-10 

5 

15 

Pitch, ° 

33 

45 

12 

Yaw, ° 

-96 

-98 

-2 


Table 3.2. Two control-points estimate errors. 



Estimated 

Truth 

Error 

Easting, m 

699581 

699761 

180 

Northing, m 

3956511 

3956998 

487 

Up, m 

150 

235 

84 

Roll, ° 

16 

0 

-16 

Pitch, ° 

65 

51 

-14 

Yaw, ° 

-114 

-99 

16 


3.4 Filtering Image-Matching Algorithm Output with a 
Kalman Filter 

In the previous seetion, dry runs on sample trajeetories reveal that traeks ean produee 
adequate estimates, but appear jumpy and lossy. In a sense, estimating the position and 
attitude of a UAV by eomparing an observed seene image and eross-eheeking it against a 
geo-refereneed referenee image is taking a physieal measurement of the loeation and attitude 
of a UAV in the world spaee against its operating environment. As physieal measurement 
proeesses are expeeted to have some uneertainty, the raw Image Matehing algorithm output 
are eonsidered to be that raw measurement. 

A Kalman filter was then used to post-proeess the raw IMMAT output. In order to use 
the Kalman filter, a simple kinematie model is used to deseribe the UAV’s motion. Let X 
represent the position (Easting, Northing, Up) and attitude {(p Roll, 6 Piteh and ip Yaw) of 
the UAV, and that the motion of the UAV ean be modelled as in Equation 3.6, where V is 
the respeetive rates of ehange (in other words, veloeity) to be estimated. 

This simple kinematie model is sufheient for the trials that were eondueted. The trials were 
all eondueted in a raee-eourse fashion with straight legs of eonstant veloeity (test flights 


35 





Current Point 



-i 60 

CO 
> 

C 


o 40 ■ 




0 


0 5 10 15 20 25 30 35 40 45 

Slop Pause Iteration 


Figure 3.16. Rate of convergence for attitude and pose using seven control- 
points. 


will be described in detail in the next chapter). 


'e'" 

N 

U 

X = 

e 

\<A/ 

X/t = Xfc_i -I- 

= Vfc_i 


(3.6) 


3.5 General Observations 

Having walked through the steps of the entire IMMAT procedure, it is possible to summarize 
a number of factors that can impact the performance of the IMMAT algorithm. 

First, drops in the IMMAT algorithm can be caused by various factors, such as insufficient 


36 










4 


Current Point 


i 2 



Slop Pause Iteration 

Figure 3.17. Rate of convergence for attitude and pose using two control- 

points. 

matches between the referenee frame and the eamera image. That could be due to blurriness 
in the eamera images eaused by vibration or motion, or because the eamera was not 
looking at the same spot while the aircraft was moving and may have disturbanees due 
to wind or pilot maneuvers. Poor matehing ean also oceur when a referenee image is 
insufficiently rieh in extraetable features, whieh ean result in low inlier count after rough 
eorrespondence matehing and then final outlier exclusion through the MSAC algorithm. 
Finally, the remaining points might be insufficient to estimate a projective transform (which 
mathematically requires a minimum of four corresponding points). 

In order to improve the eurrent performanee of the IMMAT algorithm, another modeling 
method for the re-projeetion might be used. (For example affine transformation - whieh 
although less representative of a perspective view of the terrain, it does require fewer eontrol- 
point pairs to estimate a transform; this trade-off between aeeuraey versus generating more 
estimates offer an avenue for further studies). Also, the RANSAC algorithm eulls numerous 
potential control point pairs in the proeess of estimating a projeetive transform. The effeet 
of relaxing the toleranees in the RANSAC algorithm ean also be further studied, redueing 
the aeeuraey in exehange for generating more estimates that might be useful during the 
Kalman filtering phase. 


37 











Figure 3.18. Before and after culling points that were generated with insuf¬ 
ficient control-point pairs between reference image and camera image. 


38 





CHAPTER 4: 

Test and Evaluation Setup and Procedures 


In the previous chapter, the workings of the image-matching concept were described in 
detail. This chapter describes the data sets used and the tests that were conducted to 
evaluate the performance of the IMMAT algorithm used. 

4.1 Test Equipment and Data Collection Procedures 

This study involved two aerial platforms an unmanned Tier-2 Arcturus T-20 aerial vehicle 
and a manned Cessna-206, both equipped with the TASE 200 sensor. 

According to the manufacturer, the TASE200 sensor is intended to be a compact, lightweight, 
low cost daylight and infrared camera sensor system. The sensor comes with onboard 
GPS/INS that allows the system to capture and record ground truth information while in 
flight. The pertinent specifications of the system are as follows: 

• Horizontal Eield-of-View: 10.5° for King City recording and 35.26° for Camp Roberts 
recording; 

• Image resolution: 640 x 480 pixels; however, after interpreting the TASE data, the 
resolution was found to be 696 x 464 pixels; 

• Embedded GPS/INS sensors 

• Camera records at 30 Hz 

4.2 Actual Flight Data Collection 

Two sets of data were collected: one over King City and another at Camp Roberts. The 
UAV collected data with the following characteristics: 

• cruise speed of UAV 

• distance travelled between snapshots 

• estimated maximum roll, pitch, yaw and heading changes between each snapshot 
The flights were conducted at different altitudes and aircraft attitudes in two different areas. 


39 




namely, (1) west of Paso Robles, California, and (2) west of King City, California. Detailed 
deseriptions of the two areas follows: 

The first area is within the restrieted airspaee R-R2504 west of Paso Robles, California. 
That area has a varying undulating terrain (sample images of the terrain in Camp Roberts is 
given in Figure 4.1), with an elevation of about 300m. The TASE video stream data were 
eolleeted at various altitudes as shown in Figure 4.2. 



Figure 4.1. Sample images of Camp Roberts’ terrain. 



Figure 4.2. Altitude profile versus camera frame number. 


At eaeh altitude, the UAV eonducted a turn-straight-turn-straight-turn flight profile aeeord- 
ing to the following (and as shown in Figure 4.3): 


40 
















• one minute straight flight east reeording in Electro Optics (EO) 

• one minute left turn 

• one minute straight flight west recording with EO 

• one minute left turn 

• one minute straight flight east recording in Infrared (IR) 

• one minute left turn 

• one minute straight flight west recording in IR 

• approximately one minute descent to the next lower altitude 



Figure 4.3. Full flight profile for Camp Roberts data collection. 


The second area is to the west of King City, California, with a relatively flat terrain, of 
elevation 100m (see Figure 4.4). The area of interest is between Greenfield, California, 
and King City, California (airport identifier code is KKIC), closer to Greenfield. 

The TASE video stream data were collected at altitudes 2000, 4000, 6000 and 8000 feet 
mean sea level (MSE). At each altitude, the UAV conducts a turn-straight-turn-straight-turn 
flight profile according to the following (see Figure 4.5 for an aerial view, and Figure 4.6 
for the three-dimensional view): 


41 










King City, CA 


Pacific Grove Seaside 


Spreckels 



Corral 

DeTierra 



Carmel 

Highlands 


San Benito 


Carmel Valley 


GT' 


Area of Interest 


Notleys 

Landing 



Jamesburg 


Q 

Pfeiffer Big 
Sur SfafePark 


Tassajara 
' , Hot Springs 


Millers Ranch 


f 


Figure 4.4. Flight site at west of King City, California. 


• one minute straight flight east reeording in EO 

• one minute left turn 

• one minute straight flight west reeording with EO 

• one minute left turn 

• one minute straight flight east reeording in IR 

• one minute left turn 

• one minute straight flight west reeording in IR 

• approximately one minute deseent to the next lower altitude 

The area in King City is nearly flat, with few elevation ehanges. The terrain is gridded by 
farmlands exeept towards the edges that rise up on the Salinas valley. Sample images of the 
terrain in King City is shown in Figure 4.7. 


4.3 Preliminary Steps 


Data analysis requires the data be separated into different segments of largely similar 
headings for eomparisons. This thesis eategorized the data into different UAV headings 
at different altitudes. This eategorization allowed us to evaluate the performanee of the 
algorithm for the same heading direetion of the UAV at different altitudes, and likewise, at 
the same altitude, but flying in different heading direetions. 


42 







Figure 4.5. Flight profile at west of King City, California. 


4.3.1 Data Segmentation 

To chunk the data, we used a ehunking algorithm to identify segments of flights with largely 
similar headings (as outlined in the following discussion). Essentially, an initial estimated 
heading is used to find data points that are within a eertain band. These data points are then 
used to work out the mean (//) and standard deviation (cr) of the aetual raw traek. Data 
points within (2cr) of jx are then designated to be a traek. 

function [filteredData, avg] = filterTracksfdata, estimate, band) 
coarseSifting = absfdata - estimate) < band; 
avg = mean(data(coarseSifting)); 

TwiceStdDev = 2 * std(data(coarseSifting)); 
filteredData = (absfdata - avg) < TwiceStdDev); 

end 


The algorithm operates on the data after taking an initial estimate and a toleranee band 


43 






Truth Trajectory in ENU 


I -GmUh TmBi Tuck I 



Figure 4.6. 3D flight profile at west of King City, California. 



Figure 4.7. Sample images of the terrain over the area between Greenfield, 
California and King City, California. 


coarsely sift out trajectories that might meet the initial estimates. The average and standard 
deviation of the trajectories that falls within that band are then computed and used to filter 
data out for those that are within two standard deviations away from the mean heading. An 
example of the filtered King City data output is provided in Figure 4.8, and the alternate 
view in latitude and longitude plot view is provided in Figure 4.9. 


44 






















Figure 4.8. King City track segments. 

4.3.2 Treating Known Biases in Data 

As the equipment was not eonfigured properly during the start of the flights at both Camp 
Roberts and King City, some ground-truth information logged by the TASE imagers eon- 
tained biased information. 

For the ease of Camp Roberts data, the pan of the TASE eamera was ineorreetly initialized 
at -90 deg, eausing the TASE imager to reeord the target position off to the left wing of 
the UAV. In the ease of the King City flight, the eamera was mounted on the side strut of 
the UAV, biasing the roll, piteh and yaw values. Data pre-proeessing were done to remove 
some of these biases. An example where the viewing target was eorreeted for is illustrated 
in Figure 4.10. 

4.4 Measures of Performances 

Measures of Performanees (MOPs) need to be developed to quantify and eharaeterize the 
performanee of the image matehing approaeh. The following MOPs were used in the 
experiments: 

Errors in Position and Attitude Estimates. As the image matehing approaeh is for esti- 


45 









































































Longitude, ° 

Figure 4.9. King City track segments in Lat-Lon view. 



Figure 4.10. Correcting viewing target for data collected at Camp Roberts. 


mating the attitude and loeation of the UAV, the most obvious MOPs are the errors 
assoeiated with the X, Y, Z position as well as the roll, piteh and yaw of the platform. 
The performance over various altitudes and over different terrains were studied. 

Image-matching Drop Rate. As the algorithm might not find matches all the time due to 
the real tracks deviating from the nominal trajectory and due to the lack of major 
image features within either the reference frames or the seeker images, measuring the 
image-matching drop rate is useful to quantify the stability of the algorithm. Drops 
were counted when the number of times the IMMAT algorithm used less than five 


46 























control-point pairs between the reference image and the camera image. 


4.5 Meters per Pixel Resolution 

In order to establish whether there is any correspondence between performance of the 
algorithm and the amount of information that a pixel within the reference frame might 
cover, it is necessary to work out approximately how many meters a pixel of the UAV 
camera spans on the physical ground. 



Figure 4.11. Horizontal and vertical ground resolutions. 


Referencing Figure 4.11 it is possible to compute the average distance per meter of coverage 
of the sensor on the ground as follows: 


47 




cos(9) = 
tan{-6 hfov) = 
L = 


M = 


H 


SlantRange 

L 

2 

SlantRange 

2//tan(ta:) 

cos(6) 

2//tan(^™) 

cos(0) 


(4.1) 


(4.2) 

(4.3) 


In Equations 4.2 to 4.3, L and M are the average horizontal and vertical distances of the 
projected center width and height of the camera field-of-view respectively, 6 is the angle 
between the camera’s direction of view and the normal to the Earth’s surface, and H be the 
above ground level height. The plotted results for the horizontal and vertical coverages are 
plotted in Figure 4.12 and 4.13 for Camp Roberts and King City respectively. 


1 1 1 1 

' 




- 


- 

_ 


_ 






1^ 


- 



_ 

. 



,.>f" 


“ 




, 


- 

** 


- 


+- Average Horizontal resolution Downleg 


~ 

' x- Average Vertical resolution Downleg 

“ 


+ Average Horizontal resolution Up leg 



-x- Average Vertical resolution Up leg 


_1_1_1_1_ 

Satelite Resolution 

_1_L 



1.5 2 

Meters per pixel in Camera 


Figure 4.12. Plot of distance per pixel in camera for Camp Roberts flight. 


48 













E 1200 - 














■+■ Average Horizontal resolution Downleg 
Average Vertical resolution Downieg 
Average Horizontal resolution Up leg 
Average Vertical resolution Up leg 
Satelite Resolution 


0.4 0.5 

Meters per pixel in Camera 


Figure 4.13. Plot of distance per pixel in camera for King City flight. 


49 







THIS PAGE INTENTIONALLY LEET BLANK 


50 



CHAPTER 5: 
Data Analysis 


In the previous ehapter, proeedures for analyzing the data eolleeted from test flights were 
presented. This ehapter presents an analysis of the data. After ereating nominal trajeetories 
for all traeks that were eolleeted from flights over Camp Roberts and King City, we used 
the IMMAT algorithm on the video frames to estimate the loeation and the pose of the 
UAV. The results generated by the IMMAT algorithm are analyzed in different dimensions 
in eaeh of the ensuing seetions. 

5.1 Performance of Algorithm at Different Altitudes 

One assumption for the IMMAT algorithm to work was that the projeeted view of the 
terrain would be adequate approximation of the eamera view for the purposes of image- 
matehing. At lower altitudes, the effeets of terrain eontouring may be more apparent when 
the terrain viewed from the UAV’s eamera. However, the referenee images are ereated 
from the satellite images whieh are two-dimensional and ean show a different view when 
re-projeeted at low altitudes. Figure 5.1 shows a top view of a terrain whieh when viewed 
from a different perspeetive shows the effeet of terrain eontours ehanging the view. This 
may differ signifieantly from merely applying a projeetive transform to the two dimensional 
top view. Thus, this assumption may be violated should the re-projeeted view of the planar 
satellite image differ from the aetual perspeetive view of the physieal terrain. 



Figure 5.1. Perspective view of the terrain. 


The IMMAT output results for up-leg and down-leg flights at various altitudes at different 
field-of-views used for the generation of referenee images are provided in Figures 5.2, 5.3 
and 5.4, while those eaptured from King City are found in Figures 5.5, 5.6 and 5.7. 


51 








Purely by analyzing the mean errors within the Camp Roberts data set, we see the Northing 
and pitch errors shows a downward bias in errors for both up leg and down leg flights in 
Easting, Northing and Up positions as altitude increases. Errors in positional estimates are 
lower at lower altitude. 

Eor Roll, Pitch and Yaw errors, the error experienced in either the up-leg or the down-leg 
flights appears to increase with altitude. Overall, the errors in Easting, Northing and Up are 
about +200m. Increasing the field-of-views used for the generation of Reference Images 
does not appear to improve the accuracy of positional and pose estimates. 

The variances for estimates in Up increases with altitude, implying that the errors in 
estimating altitude may be proportional to the altitude of the flight. 


•200 

•300 

•400 

-500 



5 Down Leg I 

S Lip Leg I 


ir^ 

5 


Down Leg I 

- 


1000 1500 

Altitude AGL, m 



j S Down Leg I 

10 



M Down Leg 



^ - 


1 S Down Leg 

e -T- T 5 Up Leg 




9* Up Leg 





5 Up Lag 


1000 1500 

Altitude AGL, m 


500 1000 1500 2000 2500 

Attitude AGL, m 


1000 1500 

Altitude AGL, m 


Figure 5.2. Camp Roberts error plots at various altitudes with Reference 
Frames matching at lx FOV. 

Analyzing the King City results, we find the data shows that the mean errors for Easting 
and Northing are around +200m. Above-ground-level altitude is also in line with the Camp 
Roberts data at +50m. 

Figure 5.8 shows the graphical outputs of the IMMAT algorithm at various EOVs for a 


52 
















































































































































































600 

400 

200 

E “ 

^ -200 
-400 

•600 

-800 

0 500 1000 1500 2000 2500 

Altitude AGL, m 




0 



I ^ Down Leg I 

J 1 Up Leg I 


1000 1500 2000 2500 

Altitude AGL, m 



Attitude AGL, m 


Altitude AGL, m 


Figure 5.3. Camp Roberts error plots at various altitudes with Reference 
Frames matching at 2x FOV. 

down leg track flying at 600 feet AGL. As the FOV of the Reference Images increases, 
covering more of the terrain, the overall number of potential IMMAT matches increases, 
leading to a progressively denser plot of estimated position. 

In general, images of the underlying terrain needs to be sufficiently feature-rich for the 
IMMAT algorithm to work. For the case of trajectories captured in King City, the feature 
counts in the reference frames were themselves generally low. Figure 5.9 shows the number 
of features that were extracted from each of the reference image and each camera image 
throughout an entire example trajectory (down-leg flight, at 5147 feet AGL). The track 
displayed visually is found in Figure 5.10. To begin with, the number of features fell 
below 800, averaging about 300 before coarse correspondence matching. The numbers 
after matching and RANSAC results in no inlier matches for nearly the entire trajectory. 

The satellite images used for this study over King City come from about half a year after 
the actual flight was captured; data sets closer to the actual date of flight were not used as 
they had cloud coverage that occluded land features, which are required for the IMMAT 


53 





















































































































































































5 Up Leg I 


5 Down Leg I 
5 Up Leg I 


^ Down Leg 
^ Up Leg I 


1000 1500 2000 2500 

Altitude AGL, m 



1000 1500 

Altitude AGL, m 


Figure 5.4. Camp Roberts error plots at various altitudes with Reference 
Frames matching at 3x FOV. 

registration algorithm to work. 

It was observed that the IMMAT algorithm was latehing onto permanent and prominent 
terrain features such as rivers or hills which does not change too much with time. Figure 
5.10 shows the same trajectory as previously described for the case with low feature counts. 
The stored reference frame showing a distinctive and prominent landform, in this case a 
river. 


5.2 Effect of Reference Image Field-of-View 

Intuitively, we surmise the bigger the field-of-view of the reference image, the higher the 
likelihood of it containing the potential views of the in-flight camera. The photo montage 
in Figure 5.11 shows the reference frames (represented by the green patch) generated at 
lx, 2x and 3x FOV respectively. The black triangle and magenta track shows the view 
point of the in-flight camera. For a small FOV, it was usually difficult for the reference 
frame to contain the view direction of the actual flight viewing position. At larger FOVs, 


54 











































































































































































1400 1600 1800 


-100 

-200 

-300 

-400 

-500 


^ Down Leg I 
a Up Lag I 


I a Down Leg 
I a Up Leg I 


a Down Leg 1. 

a Lip Leg | 


^ ^ 1 

1 5 Down Leg I 


1 S Up Leg 1 

< 



© 

i 

{ 

n 




1400 1600 1800 


1600 1800 


Figure 5.5. King City error plots at various altitudes with Reference Frames 
matching at lx FOV. 

the view points are usually contained inside the green patch (which will be projected into a 
rectangular reference frame). 

In general, increasing the field-of-view for the reference image generation directly lead to an 
increase in the numbers of features that can be extracted from the RIL images. Figures 5.12, 
5.13 and 5.14 shows the that when the field-of-view used to generate the reference frames 
increases, the number of features found in the reference frames increased from an average 
of around 1000 to around 10000 features. The number of inlier matches also increased on 
average with the increase in the Field-of-View size. While it would be useful also to study 
the effect of increasing the field-of-view of the actual camera view, live data is not available 
at the time of this study. 

5.3 Drop-Rates of the Image Matching Algorithm 

In the previous section, data suggests that within the altitudes of the data set (lower than 
about 2km) the accuracy of the IMMAT algorithm does not really vary with the different 


55 

















































i Doivn Leg I 
S Up Leg I 


400 600 800 1000 1200 1400 

Altitude AGL m 


Down Leg | 


1000 1200 1400 1600 1800 

Altitude AGL m 



1000 1200 1400 1600 1800 

Altitude AGL m 



Altitude AGL m 



-10 


Up Leg ! 


<1 


it 


-20 


.3(5 1-:-:-:-,-,-,-J 

400 600 800 1000 1200 1400 1600 1800 

Altitude AGL m 



400 600 800 1000 1200 1400 1600 1800 

Altitude AGL m 


Figure 5.6. King City error plots at various altitudes with Reference Frames 
matching at 2x FOV. 


altitudes. 

Another aspeet that was important for the IMMAT algorithm to be operational is the 
drop-rate. As diseussed in the previous ehapter, a drop is assessed when the number of 
eontrol-points used to estimate the loeation and pose of the UAV is less than five. An 
example of the outeome of drop rates at various altitudes for a Camp Roberts flight is found 
in Figure 5.15. On the whole, the drop rates appear to inerease with altitude. Drop rates 
are generally high; on average 80 pereent of the points do not have suffleient eontrol points 
for the IMMAT algorithm to work. 


5.4 Distribution of Image Matching Predictions 

Analyzing the graphieal output of the IMMAT algorithm we find signifieant information 
about how the data are distributed. The outputs of an uneonstrained seareh are briefly 
diseussed in the ensuing paragraphs, and then a more detailed diseussion for the eonstrained 
seareh results is provided. 


56 
































































2 Donn Leg 
^ Up Leg 


1000 1200 1400 

Altitude AGL m 


^ Down Leg I 
S Up Leg I 


1000 1200 1400 1600 1800 

Altitude AGL m 


c 


S Up Lag 


1000 1200 1400 1600 1800 

Altitude AGL m 


^ Down Leg 
S Up Leg 


1000 1200 1400 1600 1800 

Altitude AGL m 


J Down Leg I 
S Up Leg I 


1000 1200 1400 1600 1800 

Altitude AGL m 


I ^ Down Le^ - 

!| 1 Up Leg J 


1000 1200 1400 1600 1800 

Altitude AGL m 


Figure 5.7. King City error plots at various altitudes with Reference Frames 
matching at 3x FOV. 



Camera position estimates 
Nominal trajectory 
Trajectory Start 
Trajectory End 
Kalman Estimate 


Camera position estimates 
Nominal trajectory 
Trajectory Start 
Trajectory End 
Kalman Estimate 


Camera position e 
Nominal trajectory 
Trajectory Start 
Trajectory End 
Kalman Estimate 


Figure 5.8. Down leg track output for three different FOV sizes for Reference 
Frames. 


The predictions for the position of the aerial camera were observed to spread about the 
nominal trajectory for all tracks that were analyzed (see Figure 5.16) when using an 
unconstrained search. The predictions are sparse in comparison to the constrained search 
(discussed next), and when an estimate is offered, it is noisy and jumpy. This is evidenced 


57 















































































0 10 20 30 40 50 60 70 80 


Time, s 



0 10 20 30 40 50 60 70 80 

Time, s 

Figure 5.9. Generally low feature counts for King City trajectories. 

by the Kalman filtered traek providing a predieted trajeetory that was inaeeurate due to the 
unstable feeds eoming out from the uneonstrained seareh. 

Using a eonstrained seareh we observed a signifieant improvement in terms of produeing 
estimates (see Figure 5.17 for a typieal output) as well as improving the aeeuraey of the 
estimates. For the sample traek that was illustrated, when the IMMAT algorithm is able to 
find an estimate elose to the nominal trajeetory as the optimal, the algorithm was able to 
provide a very aeeurate estimate for position. However, should the eonstrained seareh not 
be able to find a solution it eventually reaehed the boundary of the fmincon seareh. As the 
Kalman filter traek is unable to distinguish between a good or a bad IMMAT estimate, the 
Kalman filtered traek also uses those predieted points along the boundary of the eonstrained 
seareh but moves towards it as there were a lot more points there. This offers an opportunity 
to filter those outliers easily. 


58 

















Figure 5.10. Salinas River a prominent and distinctive landform. 



Figure 5.11. Different field-of-views used in Reference Images generation. 


59 













Time, s 


Figure 5.12. lx FOV for Reference Images generation. 



* 0 20 40 60 80 100 



Figure 5.13. 2x FOV for Reference Images generation. 


60 


























































0 20 40 60 80 100 



Time, s 


Figure 5.14. 3x FOV for Reference Images generation. 


5.4.1 Effect of Creating Reference Frames with Larger Simulated Field 

of View 

The underlying assumption is that with a larger referenee frame, the likelihood of eapturing 
the real eamera seene within the frame will inerease Seeondly, due to the higher likelihood 
of full overlaps, the number of inlier matehes will also inerease, inereasing the probability 
of getting an IMMAT mateh. The analysis results bear out these assumption;, with a sample 
output for a Camp Roberts trajeetory shown in Figure 5.8, the number of position estimates 
inereases when a larger field of view is used to generate the referenee image frames. 


5.4.2 Performance of Image Matching Algorithm for Different Ter¬ 
rains 

The King City and Camp Roberts data sets eover two different terrain profiles the former 
being a flat terrain eovered with farms and the latter being a hilly undulating terrain. 


61 





























(/3 

CD 

O 

CD 

E 

0) 

VI 


Down Leg Drop Rate 


1 

1 

1 

ID 

F.+ 

+ 

— 

+ 

■+. 

■+ 

.+ - + 



F 


+' 

_1 




0 500 1000 1500 2000 2500 


4000 
05 3000 

05 

0 2000 
M— 

o 

^ 1000 
0 

0 500 1000 1500 2000 2500 

Altitude AGL, m 

Figure 5.15. Sample drop rates of IMMAT algorithm for Camp Roberts 
flights at various altitudes. 

5.4.3 On Flat Terrain 

For terrain that exhibit repetitive patterns (in the ease of King City - fields in one area look 
largely similar to the gridded field strueture in another area), or when the terrain laeks any 
distinetive features that might allow it to be distinguished from another area, the IMMAT 
algorithm will fail to produee a mateh, eontributing towards the drops. 

At low altitudes and small field-of-view, terrain images that were eaptured in King City 
laeked distinguishing features, leading to high drop-rates and ineffeetive IMMAT matehing. 

In King City however, the number of spurious matehes are signifieantly lower when eom- 
pared with Camp Roberts. 

Terrain images that were eaptured at low altitudes and with a small field-of-view in King 


-1 

+ ■ -f- 

-1 

+■ 

-1 

-1 

.+.+ 

— 

4 



-f 


..F 




+ 



F 



1- 

_1 

_ 

4^' 

_1 

_1 

_ 1 

_ 


62 














Figure 5.16. Typical appearance of an IMMAT output by unconstrained 
search. 

City lacked distinguishing features, which led to high drop-rates and ineffective IMMAT 
matching. Some sample scenes where there are insufficient salient features in the images 
are given in Figure 5.18. 

In King City, however, the number of spurious matches is significantly lower when compared 
with Camp Roberts. 

5.5 Analyzing Data Generated at Various Altitudes and in 
Different Flight Directions 

The IMMAT output results for up-leg and down-leg flights at various altitudes at different 
field-of-views used for the generation of the reference images are provided in Figures 5.2 
5.3 and 5.4, while those captured from King City are found in Figures 5.5, 5.6 and 5.7. 

Increasing the field-of-view of the reference images led to a greater number of matches 
between control points in the reference image and the camera image but did not appear to 
improve the overall accuracy of the positional estimate. 


63 







Camera position estimates ^ 
Nominal trajectory ^ 

Trajectory Start K 

Trajectory End 


Figure 5.17. Typical appearance of an IMMAT output by constrained search. 

Wliile tlie drop rates witliin tlie King City data sets for smaller field-of-view were signifieant 
and led to limited analyzable information, the errors in up-leg and down-leg flights showed 
a elear differentiation between Easting, Northing and piteh estimates. The reason why the 


64 







Figure 5.18. Featureless terrain. 


estimates separate out the way they do eould be due to systematie biases, whieh require 
further investigation. 

Analyzing the King City results, we observe the data shows that the mean errors for Easting 
and Northing are around +200m. Above-ground-level altitude is also in line with the Camp 
Roberts data at +50m. 

The underlying assumption is that with a larger referenee frame, the likelihood of eapturing 
the real eamera seene within the frame will inerease, and seeondly, due to the higher 
likelihood of full overlaps, the number of inlier matehes will also inerease, inereasing the 
probability of getting an IMMAT mateh. 

Figure 5.8 shows the graphieal outputs of the IMMAT algorithm at various FOVs. As the 
FOV of the Referenee Images increases, covering more of the terrain, the overall number 
of potential IMMAT matches increases, leading to a progressively denser plot of estimated 
position. 


65 



THIS PAGE INTENTIONALLY LEET BLANK 


66 



CHAPTER 6: 

Conclusions and Future Research 


This chapter concludes the thesis report. It begins by summarizing the work done for 
this thesis, the proceeds to review the key conclusions presented in the previous chapter. 
This chapter closes with a consideration of the limitations of this research and provides 
suggestions on further studies. 

6.1 Summary of Work Done 

This thesis enhanced our understanding of how to deploy image-matching algorithms for 
guided unmanned activities that may operate in a predetermined area, following a planned 
trajectory. In such a case, recently captured high-resolution images of the operational 
environment over which the planned trajectory is expected to fly can be pre-loaded onto the 
unmanned system. This information can be used as an alternative navigational aid when 
other on-board navigational equipment fails or cannot be used. One specific application 
for which this capability will be useful is in autonomous military operations within a 
GPS-degraded or a GPS-denied environment. 

This thesis was motivated by the possibility of leveraging camera sensors that are commonly 
available onboard UAVs to provide an alternative source of positional estimates. The 
purpose of pursing this approach was to develop an alternative should other sources of 
location feeds fail to provide updates. Conditions warranting such an alternative include 
when the GPS fails to work due to area denial, or when the IMU drifts too much due to various 
aerial maneuvers or because the IMU has not received current positional updates. The 
approach taken for this thesis work relies on the preliminary study conducted in 2016 by [3], 
in which they described an idea to match a camera’s view with a geo-referenced library of 
reference images. This thesis extended the work done previously by conducting a functional 
analysis on the image matching navigation problem following Systems Engineering best 
practices [7], to better frame the problem. A list of MOPs was also established to better 
characterize the behavior, performance and applicability of the IMMAT algorithm. 

Having better framed the task, we proceeded to test the IMMAT concept further by conduct- 


67 




ing experiments on the eore IMMAT algorithm based on flights held in different loeations 
and at different altitudes. To evaluate the behaviour of the image matehing real flight data 
eaptured in King City and Camp Roberts, California, were used for data analysis. This data 
eame tagged with, most importantly, the ground truth GPS loeation of the platform as well 
as the attitude of the platform at a speeifie moment in flight. 

Five major observations from the eondueted evaluations are as follow: 

1. The IMMAT approaeh relies on the feature-riehness of both satellite and onboard 
eamera images. To this end a typieal satellite image provides a resolution of 0.5 
square meters per pixel regardless of the size of the ground footprint. Resolution 
of the on-board eamera depends on the field of view (zoom setting), altitude and 
attitude. The best resolution is aehieved in a level straight flight at low altitudes with 
a maximum zoom in. ffowever, sueh a setting results in a very narrow field of view 
(significant reduction in the number of features that can be used to match those of the 
satellite image). Specifically, with the TASE-200 sensor used in this research and a 
FoV of 35 degrees (Camp Roberts flights), a resolution of 0.5m^ per pixel can only 
be achieved when flying below 400m AGL. Likewise for King City flights, where the 
videos were taken at FoV of 10 degrees, only flights below 1200m can achieve 0.5m^ 
per pixel resolution. 

2. The texture of the surface has a major influence. Specifically, flying over the agri¬ 
cultural area (between Greenfield and King City) at low altitudes with a narrow FOV 
results in no features detected in the onboard camera field of view when crop fields 
are under the flight path. Some features can be detected only when flying in between 
the crop fields. One way to mitigate this effect might be increasing the FOV, but that 
leads to decrease in resolution and possible failure to find the matches between two 
different resolution images. Still, this approach is worth exploring in the future. 

3. Onboard camera stabilization (suppression of vibrations) plays a crucial role as well. 
In this research two aerial vehicles were used. The same sensor, a TASE-200, had 
much better stabilization when flying on the UAV at 25m/s compared to that of the 
manned Cessna-206 flying twice as fast. 

4. Varying terrain elevation also influences the accuracy of the IMMAT navigational 
solution. That includes a requirement to have a detailed terrain elevation map of the 
intended area of operations. 


68 



5. Aircraft attitude plays a major role as well. In this research IMMAT performance was 
evaluated only for the straight level flight. Future evaluation should consider IMMAT 
performance while turning / climbing / descending. 

This research used a limited set of test data based on a TASK-200 sensor, which is not a 
high-end device. The sensor had some vibration isolation problems along with incorrect 
reporting of pan-tilt information (which was discovered within this research effort and 
reported to a manufacturer) resulted in an unusually high drop rate. This occurred when 
there were not enough matching points to construct a projective transformation, which is 
a basis of the IMMAT approach. Nevertheless, this thesis was able to conduct a detailed 
assessment of the overall performance of the IMMAT algorithm. 

The main conclusion is that when all conditions are met (i.e., at least five matching points 
are found), the IMMAT algorithm can provide an estimate of aerial vehicle position as 
good as within 50m from its true position (this value correlates with the satellite image 
resolution), and determine its attitude within +15 degrees for pitch and roll while finding 
its yaw angle within just ±2 degree accuracy. 

Some additional observations follows. 

• For the same field of view, as the flight profile increases in altitude allowing more of 
the local terrain to be captured, with a consequential increase in the number of features 
and the likelihood of matches, the drop rates for the IMMAT algorithm decreases. 

• If an IMMAT drop does not occur, then the error associated with IMMAT estimation 
decreases with the altitude or pixel-per-meter on the ground. 

• This thesis relies on a simple two-dimensional projection of satellite imagery into 
the view of a would-be camera in flight. The lack of elevation data introduces 
perspective differences that may contribute to the errors in estimation by the IMMAT 
algorithm. To further quantify the errors due to projection, there are two further 
experiments that can be conducted. First, real video imagery can be taken at various 
tilt angles, with the most important being vertically downwards. The downward 
view matches best with the top-down satellite view and also obviates the need for 
terrain elevation information for projection purposes. The second is to enhance the 
projection algorithm by capturing a view from a three-dimensional satellite image 
textured digital elevation model from the perspective of the camera, and comparing 


69 



the estimates with the eurrent approaeh. 

• While the RIL ean be ereated from a large eollage of high resolution satellite images 
prior to flight and then stored onboard the UAV, it ean require quite a bit of spaee 
to store the frames. For example, a nominal trajeetory that requires about 700 
referenee frames stored in high resolution amounted to 0.5GB; storing only the 
extraeted features and using only those features will require mueh less spaee. This 
presents an opportunity to investigate a method for storing the that ean work with the 
image-matehing algorithm effieiently with the image-matehing algorithm. 

• As the IMMAT algorithm produees an estimate frame-by-frame and only when suf- 
fieient matehes are found, there will be variations in the estimates generated when 
they are produeed, otherwise there are no estimates. The question is whether feeding 
the output of the IMMAT algorithm into a Kalman filtering proeess (1) produees a 
eleaner output, (2) produees (hopefully) more aeeurate positional predietions, and 
lastly (3), to use the previously known positional predietions to feed as an initial 
positional estimate into the 6-DoF optimization proeedure. 


6.2 Future Development 

There are various opportunities to study areas where the entire image-matehing navigation 
proeedure ean be optimized. One area of possible further study is to optimize the number 
of referenee frames, answering the question on what would be the minimum number of 
frames required, below whieh the performanee of the image-matehing algorithm degrades. 
One possible idea is to take advantage of the faet that the eode-base today is able to plot 
the viewpoint of the eamera onto the aerial view of the area of operations. Using this 
information, it is possible to work out how far apart the referenee frames ean be spread out 
and still eontain the viewpoint of the eamera. The algorithm as designed today generates 
referenee frames based on a nominal trajeetory that has been divided up into evenly spaeed 
segments, and then generates a projeetion at those points on the ground, given the UAV’s 
nominal pose. 

6.2.1 Creating a Feature Rich Reference Image Library 

During the ereation of the RIL, there were instanees where the referenee images seleeted 
had few features. These frames were still ineluded in the RIL to keep the algorithm simple. 


70 



so that the thesis could proceed and investigate the image matching performance of the 
6-DoF UAV pose estimation procedure instead. This is therefore one area of immediate 
future work where a technique can be developed for selecting reference image frames that 
have sufficient features, but yet sufficiently spaced out and representative of the nominal 
trajectory to be covered. In so doing, the drop-rates for the image matching algorithm will 
be immediately reduced, improving the stability of the image-matching navigation process. 

Another area of study is on the skip-rate of the incoming video stream, to answer the question 
of how many frames in an incoming video stream can be skipped to avoid unnecessary 
processing, but still allow it to provide accurate estimates on the UAV’s position. 

6.2.2 Investigating Image Feature Extraction Ability of Various Algo¬ 
rithms for Different Terrain Types 

In the previous section, data supports the claim that drop-rates are highly associated with the 
feature extraction capabilities of the image feature extraction algorithm used, if the scenes 
between the reference image and the camera view are indeed overlapping. 

As the feature extraction algorithm is a component that can be substituted, future work can 
investigate the use of other feature extraction schemes such as SIFT or BRISK. Such work 
can investigate which extraction can investigate which extraction methods are appropriate 
for the various terrain types. 

6.2.3 Managing Drops in Image Matching 

On the whole, during the batch processing of the data, high IMMAT drop-rates were 
observed —with some tracks reaching 100%. Continued work to reduce the drop rates 
needs to be done to improve the robustness and reliability of the current IMMAT algorithm 
so that it can function as a viable source for navigational updates. 

During the image-matching procedure, some scenes may not provide adequate feature 
pairings for the attitude of the UAV to be estimated. The circumstances under which 
drops may happen could be due to various reasons (those that are known were previously 
discussed in Section 3.3.1), but more flight data over different types of terrain will be useful 
to ascertain whether it might be the performance of the feature-extraction algorithm that is 


71 



affecting the overall performance, and whether the feature-extraction algorithm is terrain 
dependent. Knowing this information will be useful, and can be done ahead of time, for 
tuning the algorithm prior to any unmanned flights. This thesis relies on using the SURF 
algorithm for feature extraction, so further investigation can be conducted using different 
feature extraction algorithms for areas where the SURF algorithm gave poor results. 

ft is assumed that the projected view of the terrain is an adequate approximation of the 
camera view for the purposes of image-matching. This assumption may be violated should 
the re-projected view of the planar satellite image differ from the actual perspective view 
of the physical terrain. In order to study the differences of error in elevation projection, 
further work needs to be done with a satellite imagery textured digital elevation model of 
the terrain for in-depth studies. 

6.2.4 Using Alternate Video Streams 

This thesis was primarily assessing the effectiveness of using the day camera output of 
a UAV. Some UAVs however may also be equipped with IR cameras, which images the 
environment within a different spectral band. In terrains where the IMMAT algorithm 
may produce a poor estimation when using a day camera, the output could potentially be 
substituted with the view from the IR camera, which may reveal features that are otherwise 
imperceptible in daylight. 

6.2.5 Studying the Effect of Actual Camera Field-of-View 

Based on the data sets available, the fforizontal Field-of-View of 10.5° was used for King 
City recording and 35.26° for Camp Roberts recording. While the drop-rates seen in the 
King City flights were significantly higher than those for Camp Roberts, it is not possible 
to conclude whether it was the result of a smaller actual camera FOV or the effect of the 
terrain in King City that was challenging for the feature extraction algorithm to produce a 
match. Further studies using more data sets with varying actual camera fields-of-view are 
required to understand this aspect of the algorithm. 


72 



6.2.6 Using High-Fidelity Simulated Urban Environment Fly-By as 
Reference Images 

There are systems that can generate high-fidelity simulations based on the inputs of a 
fly-through route. One example is that used by the Urban Redevelopment Authority of 
Singapore [19], which uses the system to visualize redevelopment plans ahead of time 
before approving any master plans (see Figure 6.1). While it is difficult to replicate the 
environment accurately for remote places, it should be possible to get a reasonably accurate 
model of a 3D urban environment. One pertinent research question is whether the image¬ 
matching algorithm still be able to provide reasonable estimates of position despite using a 
simulated scene as reference. 



Figure 6.1. Simulation of an urban environment by Urban Redevelopment 
Authority of Singapore. Source: [19]. 


73 



THIS PAGE INTENTIONALLY LEET BLANK 


74 



APPENDIX A: 
TASE 200 Output Data 


The TASE200 sensor system bundles information with each frame captured, at 30Hz. This 
appendix provides a description of the TASE200 sensor data format logged by the onboard 
sensor. Table A.l shows a comprehensive listing of all the meta-data that is captured by 
the TASE system. 


Table A.l. TASE Meta-data available for analysis. 


1 

GPS Day 

(byte 41) 

25 

Mount Roll 

(bytes 260-263) 

2 

GPS Hour 

(byte 42) 

26 

Mount Pitch 

(bytes 264-267) 

3 

GPS Minute 

(byte 43) 

27 

Mount Yaw 

(bytes 268-271) 

4 

GPS Second 

(bytes 44-47) 

28 

VN 

(bytes 76-79) 

5 

Second since reset 

(bytes 136-139) 

29 

VE 

(bytes 80-83) 

6 

Second since midnight 

(bytes 12-15) 

30 

VD 

(bytes 84-87) 

7 

Gimbal Eat 

(bytes 56-63) 

31 

Heading 

(bytes 316-319) 

8 

Gimbal Eon 

(bytes 65-71) 

32 

HEOV 

(bytes 168-171) 

9 

Gimbal Alt 

(bytes 72-75) 

33 

VEOV 

(bytes 172-175) 

10 

Gimbal Pan 

(bytes 24-271) 

34 

HEOVmax 

(bytes 176-179) 

11 

Gimbal Tilt 

(bytes 28-31) 

35 

HEOVmin 

(bytes 180-183) 

12 

Gimbal Roll 

(bytes 32-35) 

36 

Zoom 

(bytes 186-187) 

13 

Image Eat 

(bytes 192-199) 

37 

HEOVmaxC2 

(bytes 212-215) 

14 

Image Eon 

(bytes 200-207) 

38 

HEOVminC2 

(bytes 216-219) 

15 

Image Alt 

(bytes 208-211) 

39 

Transx 


16 

Axis Pan Rate 

(bytes 140-143) 

40 

Transy 


17 

Axis Tilt Rate 

(bytes 144-147) 

41 

GPS Satellites 

(bytes 48-49) 

18 

Axis Roll Rate 

(bytes 148-151) 

42 

GPS Status 

(bytes 50-51) 

19 

Mount Pan Rate 

(bytes 152-155) 

43 

GPS PDOP 

(bytes 52-55) 

20 

Mount Tilt Rate 

(bytes 156-159) 

44 

Magx 

(bytes 310-311) 

21 

Mount Roll Rate 

(bytes 160-163) 

45 

Magy 

(bytes 312-313) 

22 

Roll 

(bytes 88-91) 

46 

Magz 

(bytes 314-315) 

23 

Pitch 

(bytes 92-95) 

47 

Eocus 

(bytes 256-257) 

24 

Yaw 

(bytes 96-99) 





75 





THIS PAGE INTENTIONALLY LEET BLANK 


76 



APPENDIX B: 
Satellite Images Meta-data 


This appendix summarizes the meta-data of the satellite images that were downloaded for 
King City. 


Table B.l. Meta-data for the satellite tiles downloaded for King City. 


Product Type 

Panchromatic 

Panchromatic 

Panchromatic 

Source 

WVOl 

WVOl 

WVOl 

Source Unit 

Strip 

Strip 

Strip 

Ground Sample 
Distance 

50 cm 

50 cm 

50 cm 

NIIRS 

4.7 

4.8 

4.9 

Acquisition Date 

2017-06-01 22:07 UTC 

2017-07-18 21:58 UTC 

2017-07-18 21:59 UTC 

Cloud Cover 

0.00% 

0.00% 

0.00% 

Has Cloudless 
Geometry 

Yes 

Yes 

Yes 

Off Nadir Angle 

28.6397° 

24.9357° 

16.9780° 

Sun Elevation 

59.3980° 

61.9622° 

61.8045° 

Sun Azimuth 

251.1663° 

243.9128° 

244.1939° 

Data Layer 

daily_take 

daily_take 

daily _take 

Crs From Pixels 

EPSG:4326 

EPSG:4326 

EPSG:4326 

Precise Geome¬ 
try 

Yes 

Yes 

Yes 

Per Pixel X 

4.50E-06 

4.50E-06 

4.50E-06 

Per Pixel Y 

-4.50E-06 

-4.50E-06 

-4.50E-06 

CE90 Accuracy 

8.4 

8.4 

8.4 

RMSE Accuracy 

3.914259087 

3.914259087 

3.914259087 

Spatial Accu¬ 
racy 

1:12,000 

1:12,000 

1:12,000 


77 





THIS PAGE INTENTIONALLY LEET BLANK 


78 



APPENDIX C: 

Schematic of MATLAB Program Flow 


This appendix describes the flow of the MATLAB program. At a high level, the software 
is broken up into several major functional aspects stored in different files: TracksDB .m, 
CreateSatellitelmageryAndTransforms.m, GenerateRawTrajectoryVideoClip.m, 
GenerateRawTraj ectory. m, GenerateNominalTraj ectory. in, 
GenerateReferenceFrames .m and ImageMatchingAlgorithm.m. 


TracksDB.m 

After segmenting the raw tracks in the TASK videos into raw trajectories, this file records 
the starting and ending indices in TracksDB. The average above-ground-level altitude of 
each track, and its assigned track name are also stored in the database for easy reference in 
the rest of the program. 


CreateSatellitelmageryAndTransforms.m 

This script takes a folder of satellite image tiles and stitches them together into a large canvas. 
The script also computes the transform that maps each image pixel to UTM coordinates. 


Gener ateRawTr aj ectory.m 

This function takes the starting and ending indices of the associated meta-data with the 
camera frames and pre-processes to the data to remove known biases. 


GenerateRawTrajectoryVideoClip.m 

This function takes the starting and ending indices from the TracksDB and assembles the 
separate frames into a video clip, that will be fed into the IMMAT algorithm. 


79 





Horizontal 
vertical fov 
Resolution 
zoom level 
focal length 



Output 

Thumbnail 

Thumbnail xy to UTM tx 
Full res 

Full res xy to UTM tx 
Map corners in UTM 
Map edges in WGS lat Ion 


CreateSatellitelmageryAndTransforms 



Interpolate Time 
Compute Aspect Ratio 


ReadingJpegSeries_RD.m 


F(x,y) = UTM 


Figure C.l. Schematic for CreateSatellitelmageryAndTransforms.m 


GenerateNominalTrajectory.m 

This function generates a nominal trajeetory based on the raw trajeetory data by smoothening 
the data. The nominal trajeetory contains position and pose information for a would be 
eamera in flight to be used for generating referenee frames. 


GenerateReferenceF rames.m 

This function takes the location and pose of a camera as deseribed in the nominal trajeetory 
and performs projective transformation of the top-down satellite view into the perspeetive 
view of the eamera. 


ImageMatchingAlgorithm.m 

ImageMatehingAlgorithm is the eore funetion that executes the IMMAT proeedure. Image- 
MatehingAlgorithm.m first starts by extraeting the features from both the referenee image 
and the camera image, then doing a rough eorrespondenee matehing, and finally passing that 
information to estimateGeometricTransform which will cull outlier matehes and then 
estimate a projeetive transformation between the referenee frame and the eamera image. 


80 

















GenerateNominalTrajectory 


Produces camera view for IMMAT algorithm 



Goes to IMMAT algorithm 


GenerateRawTrajectoryVideoClip 


Figure C.2. Schematic for GenerateNominalTrajectory.m 



Figure C.3. Schematic for GenerateReferenceFrames.m 


It then calls estiraateCameraPositionandOrientation which projects the found inlier 
points of the camera image onto the ground plane and minimizes the displacement error 
between the observed points and re-projected points. 


81 


Goes to 

Generate Reference Frames 
as truth 






































Camera Video 

Reference Image Library Number of Frames Nominal trajectory data 




ImageMatchingAlgorithm.m 


Finish 


Figure C.4. Schematic for ImageMatchingAlgorithm.m 


82 

















List of References 


[1] D. Titterton and J. L. Weston, Strapdown inertial navigation technology. lET, 2004, 
vol. 17. 

[2] W. Kong, G. Egan, and T. Cornall, “Eeature-based navigation for UAVs,” in Intelli¬ 
gent Robots and Systems, 2006 lEEE/RSJ International Conference on. IEEE, 2006, 
pp. 3539-3543. 

[3] O. Yakimenko and R. Decker, “On the development of an image matching naviga¬ 
tion algorithm for aerial vehicles,” Proceedings of the IEEE Aerospace Conference, 
Big Sky, MT, 2016. 

[4] U.S. Government. GPS Constellation Arrangement. [Online]. Available: http://www. 
gps.gov/systems/gps/space/. Accessed September 13, 2017. 

[5] GPS Triangulation. [Online]. Available: http://gis.depaul.edu/shwang/teaching/ 
geog258/GPS.htm. Accessed September 13, 2017. 

[6] Department of Defense. (2011). The Unmanned Systems Integrated Roadmap 
PY2011-2036. [Online]. Available: https://my.nps.edu/documents/106607930/ 
106914584/UxV-i-DoD-i-Integrated-i-Roadmap-i-2011.pdf/0fl23fbl-eflf-4842-9855- 
85al36b28a93. Accessed September 13, 2017. 

[7] B. S. Blanchard and E. J. Benjamin, Systems Engineering and Analysis, 5th ed. En¬ 
glewood Cliffs, NJ, Prentice Hall, 2010. 

[8] K. W. Eure, C. C. Quach, S. E. Vazquez, E. E. Hogge, and B. E. Hill, “An applica¬ 
tion of UAV attitude estimation using a low-cost inertial navigation system,” 2013. 

[9] Omnidirectional Vision at University of Pennsylvania, (n.d.). University of Pennsyl¬ 
vania. [Online]. Available: http://www.cis.upenn.edu/~kostas/omni/. Sep 13, 2017. 

[10] Omnidirectional Vision at University of Essex, (n.d.). University of Essex. [Online]. 
Available: http://cswww.essex.ac.uk/mv/images.html. Sep 13, 2017. 

[11] I. E. Mondragon, P. Campoy, C. Martinez, and M. Olivares, “Omnidirectional vi¬ 
sion applied to unmanned aerial vehicles (UAVs) attitude and heading estimation,” 
Robotics and Autonomous Systems, vol. 58, no. 6, pp. 809-819, 2010. 

[12] DigitalGlobe Satellite Imagery. (2017). [Online]. Available: https://www. 
digitalglobe.com/. Accessed 16 Apr 2017. 


83 




[13] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Interna¬ 
tional Journal of ComputerVision, vol. 60, no. 2, pp. 91-110, 2004. 

[14] S. Leutenegger, M. Chli, and R. Y. Siegwart, “BRISK: Binary robust invariant 
scalable keypoints,” in 2011 IEEE International Conference on Computer Vision 
(ICCV). IEEE, 2011, pp. 2548-2555. 

[15] H. Bay, T. Tuytelaars, and E. Van Gool, “SURE: Speeded up robust features,” Com¬ 
puter Vision-ECCV 2006, pp. 404-417, 2006. 

[16] M. A. Eischler and R. C. Bolles, “Random sample consensus: a paradigm for model 
fitting with applications to image analysis and automated cartography,” Communica¬ 
tions of the ACM, vol. 24, no. 6, pp. 381-395, 1981. 

[17] S. Choi, T. Kim, and W. Yu, “Performance evaluation of RANSAC family,” Journal 
of Computer Vision, vol. 24, no. 3, pp. 271-300, 1997. 

[18] R. Earagher, “Understanding the basis of the Kalman filter via a simple and intuitive 
derivation [lecture notes],” IEEE Signal Processing Magazine, vol. 29, no. 5, pp. 
128-132, 2012. 

[19] “Channel News Asia, URA explores creating 3D digital models of city area using 
drones,” May 2015. Available: http://www.channelnewsasia.com/news/singapore/ 
ura- explores- creating- 3d-digital-models- of- city- area- using- drone- 8269966 


84 



Initial Distribution List 


1. Defense Technical Information Center 
Ft. Belvoir, Virginia 

2. Dudley Knox Library 
Naval Postgraduate School 
Monterey, California 


85 




